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PART I 


Target Processors 


A major advantage of the use of a high-level language is its independence of 
the hardware its generated code will eventually run on; that is, its portability. 
However, one of the main strands of this book is the interaction of software with 
its hardware environment, and thus it is essential to use real products in both 
domains. For clarity, rather than describing a multitude of devices, most of the 
examples are based on just two microprocessors. Two, rather than one, not to 
loose sight of the portability aspects of high-level code. 

In this part I describe the Motorola 6809 and 68000/8 microprocessors, the 
chosen devices. This gives us a hardware target spectrum ranging from 8 through 
32-bit architecture. As both microprocessors share a common ancestor, the com¬ 
plexity is reduced compared with a non-related selection. Where necessary, other 
processors are used as examples, but in general the principles are si mi lar irre¬ 
spective of target. If the hardware detail seems excessive to a reader with a 
software background, much may be ignored if building the miniproject circuitry 
of Part 3 is to be o mi tted. 



CHAPTER 1 


The 6809 Microprocessor: Its 
Hardware 


The microprocessor revolution began in 1971 with the introduction of the Intel 
4004 device. This featured a 4-bit data bus, direct addressing of 512 bytes of 
memory and 128 peripheral ports. It was clocked at 108 kHz and was imple¬ 
mented with a transistor count of 2300. Within a year, the 8-bit 200kHz 8008 
appeared, addressing 16kbyte of memory and needing a 3500 transistor imple¬ 
mentation. The improved 8080 replacement appeared in 1974, followed a few 
months later by the Motorola 6800 MPU m. Both processors could directly ad¬ 
dress 64 kbytes of memory through a 16-bit address bus and could be clocked at 
up to 2 MHz. These two families, together with descendants and inspired close 
relatives, have remained the industry standards ever since. 

The Motorola 6800 MPU CD was perceived to be the easier of the two to use 
by virtue of its single 5 V supply requirement and a clean internal structure. The 
8085 MPU is the current state of the art Intel 8-bit device. First produced in 
1976, it has an on-board clock generator and requires only a single power supply, 
but has a virtually identical instruction set to the 8080 device. Soon after Zilog 
produced its Z80 MPU which was upwardly compatible with Intel's offering, then 
the market leader, with a much extended instruction set and additional internal 
registers m. 

The Motorola 6802/8 MPUs (1977) also have internal clock generators, with the 
former featuring 128 bytes of on-board RAM. This integration of support mem¬ 
ory and peripheral interface leads to the single-chip microcomputer unit (MCU) or 
micro-controller, exemplified by the 6801, 6805 and 8051 MCU fa mi lies j4]. The 
6809 MPU introduced in 1979 EOIZI was seen as Motorola's answer to Zilog's Z80 
and these both represent the most powerful 8-bit devices currently available. By 
this date the focus was moving to 16- and 32-bit MPUs, and it is unlikely that 
there will be further significant developments in general-purpose 8-bit devices. 
Nevertheless, these latter generation 8-bit MPUs are powerful enough to act as the 
controller for the majority of embedded control applications, and their architec¬ 
ture is sophisticated enough to efficiently support the requirements of high-level 
languages; more of which in later chapters. Furthermore, many MCU fa mi lies 
have a core and language derived from their allied 8-bit MPU cousins. 
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ARCHITECTURE 3 


1.1 Architecture 

The internal structure of a general purpose microprocessor can be partitioned 
into three functional areas: 

1. The mi ll. 

2. Register array. 

3. Control circuitry. 

Figure ITT! shows a simplified schematic of the 6809 MPU viewed from this per¬ 
spective. 

THE MILL 

A rather old fashioned term used by Babbage GO for his mechanical computer 
of the last century to identify the arithmetic and logic processor which ' ground’ 
the numbers. In our example the 6809 has an 8-bit arithmetic logic unit (ALU) 
implementing Addition, Subtraction, Multiplication, AND, OR, Exclusive-OR, NOT 
and Shift operations. Associated with the ALU is the Code Condition (or Sta¬ 
tus) register (CCR). Five of the eight CCR bits indicate the status of the result of 
ALU processes. They are: C indicating a Carry or borrow, V for 2’s complement 
overflow, Z for a Zero result, N for Negative (or bit 7 = 1) and H for the Half carry 
between bits 3 and 4. These flags are set as a result of executing an instruction, 
and are normally used either for testing and acting on the status of a process, or 
for multiple-byte operations. The remaining three bits are associated with inter¬ 
rupt handling. The I bit is used to lock out or mask the IRQ interrupt, and the 
F bit carries out the same function for the FIRQ interrupt. During an interrupt 
service routine the E flag may be consulted to see if the Entire register state has 
been saved (IRQ, NMI and SWI) or not (FIRQ). More details are given in Section 6.1. 

REGISTER ARRAY 

The 6809 has two Data registers, termed Accumulators A and B. These Data reg¬ 
isters are normally targeted by the ALU as the source and destination for at least 
one of its operands. Thus ADDA #50 adds 50 to the contents of Accumulator_A (in 
register transfer language, RTL, this is symbolized as [A] <- [A] + 50, which 
reads ' the contents of register A become the original contents of A plus 50’). Op¬ 
erations requiring one operand can seemingly be done directly on external mem¬ 
ory; for example, INC 6000/r which increments the contents of location 6000 h 
([6000] <- [6000] + 1). The suffix h indicates the hexadecimal number base, 
whilst b denotes binary. However, in reality the MPU executes this by bringing 
down the contents of 6000 h (written as [6000]), uses the ALU to add one and 
returns it. Whilst this fetch and execute process is invisible to the programmer, 
the penalty is space and time; INC M (3 bytes length) takes 7 ps and INCA or INCB 
(1 byte length) takes 2 ps (at a 1 MHz clock rate). Thus while it is always better 
to use the Data registers for operands, this is difficult in practice because there 
are only two such registers. Unlike the older 6800 MPU, the 6809’s two 8-bit Data 
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Figure 1.1 Internal 6809/6309 structure. 
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ARCHITECTURE 5 


registers can be concatenated to one 16-bit double register A:B; the D Accumu¬ 
lator. A few operations such as Add (e.g. ADDD #4567) can directly handle this. 
But although the 6809 has pretensions to be a 16-bit MPU, the ALU is only 8-bits 
wide and instructions such as this require two passes; but they are nevertheless 
faster than two single operations. 

Six dedicated Address registers are accessible to the programmer and are as¬ 
sociated with generating addresses of program and operand bytes external to the 
processor. The Program Counter (PC) always points to the current program byte 
in memory, and is automatically incremented by the number of operation bytes 
during the fetch. It normally advances monotonically from its start (reset) value, 
with discontinuities occurring only at Jump or Branch operations, and internal 
and external interrupts. 

Two Index registers are primarily used when a computed address facility is de¬ 
sired. For example an Index register may be set up to address or point to the first 
element of a byte array. At any time after this, the nth element of this array can be 
fetched by augmenting the contents of the Index register by n . Thus the instruc¬ 
tion L DA 6 ,Xbrings down array [6] to Accumulator_A ([A] <- [[X]+6]). Index 
registers can also be automatically or manually incremented or decremented and 
thus can systematically step through a table or array. The 6809 does not have a 
separate ALU for computed address generation, and this can make the execution 
of such operations rather lengthy. Sometimes Index registers are used, rather 
surreptitiously, to perform simple 16-bit arithmetic, for example counting loop 
passes. An example is given in the listing of Table [2T9l 

The System Stack Pointer (SSP) register (also known as Hardware Stack Pointer) 
is normally used to identify an area of RAM used as a temporary storage area, 
to facilitate the implementation of subroutines and interrupts. These techniques 
are discussed in Chapters 5 and 6. Rather unusually the 6809 also has a User 
Stack Pointer (USP), which can be usefully employed to point to an area of RAM 
which can be used by the programmer to place data for retrieval later and will 
not get mixed in with the automatic action of the SSP. Both Stack Pointers can 
also be used as Index registers. 

The address size of most 8-bit MPUs is 16-bits wide, allowing direct access 
to 65,536 (2 16 ) bytes. With a data bus of only 8-bits width, instructions which 
specify absolute addresses will be at least three bytes long (one or more bytes 
for the operation code and two for the address). As well as needing space, the 
three fetches take time. To reduce this problem, the 6800 and 6502 processors 
use the concept of zero page addressing. This is a shortform absolute address 
mode which assumes that the upper address byte is 00 h. Thus in 6800 code, 
loading data from location 005Fh (LDAA 005F) can be coded as: B6-00-5F (4 cy¬ 
cles) using the 3-byte Extended Direct address mode or 96-5F (3 cycles) with the 
2-byte Direct address mode. In the 6809 MPU this concept has been extended in 
that the direct page can be moved to any 256-byte segment based at 00 to FF h, 
the segment number being held in the Direct Page register (DP). Thus, supposing 
locations 8000 - 8 OFF /1 hold peripheral interface devices which are frequently be¬ 
ing accessed, then transferring the segment number 80 h into the DP means that 
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the instruction LDA 5F, coded as 96-5F, actually moves data from 805Fh into 
Accumulator_A. When the 6809 is Reset, the DP is set to 00 h and, unless its value 
is changed, direct addressing is equivalent to zero page addressing. The DP can 
be changed dynamically as the program progresses, but this is worthwhile only 
if more than eight accesses within a page are to be made. 

CONTROL CIRCUITRY 

The remaining registers shown in Fig. 11.11 are invisible to the programmer, in 
that there is no direct access to their contents. Of these, the Instruction decoder 
represents the intelligence' of the MPU. In essence its job is to marshal all available 
resources in response to the operation code word fetched from memory. This 
sequential control function is the most complex internal process undertaken by 
the MPU; however, its design is beyond the scope of this text. References (9] [10 
are useful background reading in this regard. Suffice to say that the 6809, like 
its earlier relatives, uses a random logic circuit for its decoder implementation. 
This provides for the highest implementation speed but at the expense of a less 
structured set of programming operations. 


1.2 Outside the 6809 

The 6809 MPU is available in a 40-pin package, whose pinout is shown in Fig. ll.2l 
The 40 signals can be conveniently divided into three functional groups, data, 
address and control. Unlike the 808x family, all signals are non-multiplexed, that 
is they retain the same function throughout the clock cycle, see Fig. 11.31 Signals 
are all Transistor-Transistor Logic (TTL) voltage-level compatible. 

DATA BUS d(n) 

A single bidirectional 8-bit data bus carries both instruction and operand data 
to and from the MPU (Read and Write respectively). When enabled, data lines 
can drive up to four 74 LS loads and a capacitive loading of 130pF without exter¬ 
nal buffering. Data lines are high-impedance (turned off) when the processor is 
halted or in a direct memory access (DMA) mode. 

ADDRESS BUS a(n) 

Sixteen address lines can be externally decoded to activate directly up to 2 16 byte 
locations which can be placed on the common data bus. During cycles when 
the MPU is internally processing, the address bus is set to all ones (FFFF/i) and 
the data bus to Read. When enabled, up to four 74 LS loads and 90 pF can be 
driven. Activating Halt or DMA/BREQ turns off (or floats) these bus lines. 

CONTROL BUS 

All MPUs have si mi lar data and address buses, but differ considerably in the 
miscellany of functions conveniently lumped together as the control bus. These 
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6809 


Figure 1.2 6809 pinout. 

indicate to the outside world the status of the processor, or allow these external 
circuits control over the processor operation. 

Power (V cc , V ss ) 

A single 5 V± 5% supply dissipating a maximum of 1.0 W (200 mA). The analogous 
Hitachi 6309 CMOS MPU dissipates 60 mW during normal operation and 10 mW 
in its sleep mode. 

Read/Write (R/W) 

Used to indicate the status of the data bus, high for Read and low for Write. Halt 
and DMA/BREQ float this signal. 

Halt 

A low level here causes the MPU to stop running at the end of the present instruc¬ 
tion. Both data and address buses are floated, as is R/W. While halted, the MPU 
does not respond to external interrupt requests. The system clocks (E and Q) 
continue running. 

DMA/BREQ 

This is si mi lar to Halt in that data, address and R/W signals are floated. How¬ 
ever, the MPU does not wait until the end of the current instruction execution. 
This gives a response delay (sometimes called a latency) of 1 \ cycles, as opposed 
to a worst-case Halt latency of 21 cycles 0. The payback is that because the 
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processor clock is frozen, the internal dynamic registers will lose data u nl ess 
periodically refreshed. Thus the MPU automatically pulls out of this mode every 
14 clock cycles for an internal refresh before resuming (cycle stealing). 


Reset 

A low level at this input will reset the MPU. As long as this pin is held low, the 
vector address FFFEh will be presented on the address bus. On release, the 16-bit 
data stored at FFFEh and FFFFh will be moved to the Program Counter; thus the 
Reset vector FFFE:Fh should always hold the restart address (see Fig. 16.41 . 

Reset should be held low for not less than 100 ms to permit the internal clock 
generator to stabilize after a power switch on. As the Reset pin has a Schmitt- 
trigger input with a threshold (4 V minimum) higher than that of standard TTL- 
compatible peripherals (2 V maximum), a simple capacitor/resistor network may 
be used to reset the 6809. As the threshold is high, other peripherals should be 
out of their reset state before the MPU is ready to run. 

Non-Maskable Interrupt (NMI) 

A negative edge (pulse width one clock cycle minimum) at this pin forces the MPU 
to complete its current instruction, save all internal registers (except the System 
Stack Pointer, SSP) on the System stack and vector to a program whose start ad¬ 
dress is held in the NMI vector FFFCDh. The E flag in the CCR is set to indicate 
that the Entire group of MPU registers (known as the machine state) has been 
saved. The I and F mask bits are set to preclude further lower priority interrupts 
(i.e. IRQ and FIRQ). If the NMI program service routine is terminated by the Re- 
Turn from Interrupt (RTI) instruction, the machine state is restored and the 
interrupted program continues. After Reset, NMI will not be recognized until 
the SSP is set up (e.g. LDS #T0S+1 points the System Stack Pointer to just over 
the top of the stack, TOS). More details are given in Section 6.1. 

Fast Interrupt Request (FIRQ) 

A low level at this pin causes an interrupt in a similar manner to NMI. However, 
this time the interrupt will be locked out if the F mask in the CCR is set (as it is 
automatically on Reset). If F is clear, then the MPU will vector via FFF6:7/i after 
saving only the PC and CCR on the System stack. The F and I masks are set to 
lock out any further interrupts, except NMI, and the E flag cleared to show that 
the Entire machine state has not been saved. 

As FIRQ is level sensitive, the source of this signal must go back high before 
the end of the service routine. 


Interrupt Request (IRQ) 

A low level at this pin causes the MPU to vector via FFF8:9h to the start of the 
IRQ service routine, provided that the I mask bit is cleared (it is set automatically 
at Reset). The entire machine state is saved on the System stack and I mask set 
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to prevent any further IRQ interrupts (but not FIRQ or NMI). As in FIRQ, the IRQ 
signal must be removed before the end of the service routine. On RTI the machine 
state will be restored, and as this includes the CCR, the I mask will return low 
automatically. 

Bus Available, Bus Status (BA, BS) 

These are status signals which may be decoded for external control purposes. 
Their four states (BA, BS) are: 

00 : Normally running 

01 : Interrupt or Reset in progress 

10 : A software SYNC is in progress (see Section 6.2) 

11 : MPU halted or has granted its bus to DMA/BREQ 

Clock (XTAL, EXTAL) 

An on-chip oscillator requires an external parallel-resonant crystal between the XTAL 
and EXTAL pins and two small capacitors to ground (see Fig. ll3.lt . The internal 
oscillator provides a processor clocking rate of one quarter of the crystal reso¬ 
nant frequency. The basic 6809 MPU is a 1 MHz device requiring a 4 MHz crystal, 
whilst the 68A09 and 68B09 1.5 and 2 MHz versions need 6 and 8 MHz crystals 
respectively. The Hitachi 6309 MPU is available in a 3 MHz version. In all cases 
there is a lower frequency limit at 100 kHz, due to the need to keep the internal 
dynamic registers constantly refreshed. If desired, an external TTL-level oscilla¬ 
tor may be used to drive EXTAL, with XTAL grounded. 

The 6809E/6309E MPUs do not have an integral clock generator, but provide 
additional control functions suitable for multi-processor configurations. 

Enable, Quadrature (E, Q) 

These are buffered clock signals from the internal (or external) clock generator. 
They are used to synchronize devices taking data from or putting data on the 
data bus. We will look at the timing relationship between these signals and the 
main buses in the following section. E is sometimes labelled </> 2 after the second 
phase clock signal needed for the 6800 MPU, which fulfilled a si mi lar role. 

Memory Ready (MRDY) 

This is a control input to the internal clock oscillator. By activating MRDY, a 
slow external memory or peripheral device can freeze the oscillator until its data 
is ready. This is subject to a ma xi mum of 10 ms, in order to keep the MPU's 
dynamic registers refreshed. 


1.3 Making the Connection 

A microprocessor monitors and controls external events by sending and receiving 
information via its data bus through interface circuitry. In order to interface to 
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Figure 1.3 A snapshot of the 6809 MPU reading data from a peripheral device. Worst-case 1 MHz 
device times are shown. 

a MPU, it is necessary to understand the interplay between the relevant buses and 
control signals. These involve sequences of events, and are usually presented as 
timing or flow diagrams. 

Consider the execution of the instruction LDA 6000h([A] <- [6000h]). This 
instruction takes four clock cycles to implement; three to fetch down the 3- 
byte instruction (B6-60-00) and one to send out the peripheral (memory or oth¬ 
erwise) address and put the resulting data into Accumulator_A. Figure [L3l shows 
a somewhat simplified state of affairs during that last cycle, with the assumption 
of a 1 MHz clock frequency. The address will be out and stable by not later than 
25ns before Q goes high (t AQ ). The external device (at 6000 h in our example) 
must then respond and set up its data on the bus by no later than 80 ns (£ DS r) 
before the falling edge of E, which signals the cycle end. Such data must remain 
held for at least 10 ns (t DHR ) to ensure successful latching into the internal data 
register. t AQ , t DSR , t DHR for the 68B09 2 MHz processor are 15, 40 and 10 ns re¬ 
spectively. 

Writing data to an external device or memory cell is broadly similar, as illus¬ 
trated in Fig. 1 1.41 which shows the waveforms associated with, for example, the 
last cycle of a STA 8000h (Store) instruction. 

Once again the Address and R/W signals appear just before the rising edge 
of Q, t A Q. This time it is the MPU which places the data on the bus, which will 
be stable well before the falling edge of Q. This data will disappear within 30 ns 
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Figure 1.4 Sending data to the outside world. 


after the cycle end t DHW ; the corresponding address hold time t AH is 20 ns. 

Earlier members of the 6800 family did not provide a Q clock signal. In these 
cases the end of the E signal had to be used to turn off or trigger the external 
device when writing. As there are only 30 ns after this edge before the data 
collapses, care had to be taken to ensure that the sum of the address decoder 
propagation delay plus the time data must be held at the peripheral interface 
device after the trigger event (hold time) satisfies this criterion. Because of this 
tight timing requirement, the E clock is normally routed directly to the interface 
circuitry, rather than be delayed by the address decoder (e.g. see Fig. 11.91 . With 
the 6809, it is preferable to use the falling edge of the Q clock for this purpose 
when writing. While reading of course, the peripheral interface must be enabled 
up to (and a little beyond) the end of the E cycle, at which point the MPU captures 
the proffered data. 

The basic structure of a synchronous common data bus MPU-based system is 
shown in Fig. 11.51 The term synchronous is used to denote that normal commu¬ 
nication between peripheral device and MPU is open loop, with the latter having 
no knowledge of whether data is available or will be accepted at the end of a clock 
cycle. If a peripheral responds too slowly, its garbled data will be read at the end 
of the cycle irrespective of its validity. In such cases MRDY can be used to slow 
things down, although this is considered an abnormal transition. The alternative 
closed-loop architecture is discussed on nagelTTl 
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Figure 1.5 The structure of a synchronous common-bus microcomputer. 
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As all external devices communicate to the master through a single common 
data highway, it is necessary to ensure that only one is active on any exchange. 
All microprocessors use an address bus for this purpose. Taken together with 
external decoding circuitry, each target can be assigned a specific address and 
thus enabled uniquely. As depicted in Fig. 11.51 only one decoder is used, but in 
a larger system there is likely to be one central decoder dividing the available 
memory space into zones or pages, and local decoders providing the ' fine print’. 
Memory chips of course are not single devices, but comprise a multitude of ad¬ 
dressable cells: they have their secondary decoder on-board. The 808x family 
use separate address buses for memory and peripheral selection. As well as re¬ 
quiring additional pins on the package, special instructions must be provided to 
use them. 

There is nothing special about address decoder design rmim Implementa¬ 
tion techniques range through gates, comparators, decoders, PROMs and PALs. 
Figure |1.6l a) shows a very simple page decoder which splits up the available 
64 kbyte memory space into eight 4 kbyte zones. The decoder output of Fig. |1.6{ b)(i) 
assumes that the 74138 is permanently enabled. Notice that the signal does not 
begin to go back high until after the address collapses, that is 10 ns after the cycle 
end. There is no problem during a Read, as the MPU will already have latched in 
the data; but during a Write, the data will collapse in 30 ns, leaving only 10 ns for 
decoder propagation delay and peripheral hold time. Using the E clock to enable 
the decoder (e.g. E to G1 in Fig. ll.6t ah extends the permissible propagation delay 
plus hold time to 30ns. For example, if we take the 74LS377 of Fie. iLTl used as an 
8-bit output port, then its hold time is 5 ns minimum and the propagation delay 
time for the 74LS138 from Cl is 26ns worst-case. Clearly a hazardous race. 

To avoid such races we can directly qualify each device which can be written 
to by either E, or preferably Q. The 74LS377 octal D flip flop array used as an 
8-bit output port is selected at the appropriate address, 6000 h in Fig. ll.7l by the 
decoder, but the data is only clocked in at the falling edge of Q. This leaves around 
\ cycle before the data collapses. Where separate enable and clock controls are 
not provided, the decoder signal may be gated by a derivative of Q. 

RAM chips are more problematical as they need to be enabled until the end of 
the cycle when being read from, but cut off early when writing to. This differen¬ 
tiation can be accomplished by qualifying the R/W signal by Q, producing: 

RAM.R/W = R/W+Q 

which is high irrespective of Q during a Read, and is just Q when writing. As 
shown in Fig. 11.81 it is normal to ensure that the RAM will not output data during 
a Write-to operation, by driving the RAM’s Output_Enable with the complement 
of R/W. The ' doctored’ RAM_R/W signal may of course be used for as many RAM 
chips as are present in the system. It may also be used to replace Q in Fig. o 
having the advantage that the output port cannot be erroneously read. 

Care must be taken when interfacing memory chips to choose a device with 
a suitable access time. This is especially true for more recent MPUs, which can 
run at higher clock rates. The access time for a memory chip is normally given 
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Figure 1.7 A simple byte-sized output port. 


Address Decoder (OOOOh) 



Figure 1.8 Talking to a 6116 2 kbyte static RAM chip. 
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as the duration from the application of a stable address or chip enable until the 
activation of the cell to be read from or written to. In the 6116 RAM, this internal 
decoding occurs irrespective of the state of the chip enables. Looking first at 
RAM interfacing and taking Fig. 11.81 as an example, it is clear that the writing 
action is the more critical as this will end earlier at the falling edge of Q. From 
Fig. I1.4l we see that we have t AQ + t QH less the RAM data setup time. The Hitachi 
HM6116AP-20 has a setup requirement of 50 ns and a 200 ns access time, so: 

t aq + Iqh — 50 > 200 

Iaq + f qh — 250 ns 

At 1 MHz, t AQ + t q H is 455 ns, but this shri nk s to 230 ns for a 2 MHz clock. Thus 
a 150 ns access time RAM chip must be used in the latter instance; for example 
the Hitachi HM6116AP-15. The 6264 RAM has an access time measured from the 
chip select. In this case the address decoder delay must be part of the calculation. 
An example of this is given in Section 3.3. 

ROM chips are interfaced in a si mi lar fashion, but of course they are read¬ 
only. Referring to the timing diagram of Fig. 11.31 we see that as data from the 
ROM must be present f DSR before the end of the cycle, we have the relationship 
t C yc — f avs — f dsr — taccess- At 1 MHz this sums to 720 ns, and 380 ns at2MHz. Most 
of the smaller EPROMs, for example the 2 kbyte Texas Instruments TMSD2516JL, 
have 450ns access times. The TMS2764-25JL is an 8kbyte 250ns device and is 
therefore suitable for the higher-speed processor. 

Rather than qualifying each write-to peripheral by Q, it is possible to enable 
the address decoder directly. Thus the decoder should have a lengthy output 
pulse when a read is in operation, but be cut short (at the end of Q) when a write 
is in progress. This relationship can be written as: 

Enable = (R/W-(E+Q)) + (R/W-Q) 

giving the decoder output waveforms shown in Fig. I1.6( b)(ii) and (iii). To make 
use of the two active low G2A and G2B 74138 inputs, a little Boolean algebra 
yields: 

(R/W-E) + (R/W-Q) + (R/W-Q) 

(R/W-E) + Q- (R/W + R/W) 

(R/W-E) + Q 

(R/W + QWQ + E) = (G2AHG2B) 

giving the qualifying network of Fig. ll.6I a). 

Special-purpose 6800 family peripheral interface devices, such as the PIA of 
Fig. oca, are designed to work in harmony with older MPU types which only 
provide an E signal. They all have an enable input designed to be directly driven 
by E, and have data hold time requirements within the 30 ns li mi t. They must not 
be disabled early in the cycle by a Q related signal. This means that 68xx periph¬ 
erals cannot be selected by a modified decoder, such as in Fig. I1.6l a). However, 
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Figure 1.9 Interfacing a 6821 Peripheral Interface Adapter to the 6809. 


it is permissible to mix the two kinds of peripheral devices, each enabled by the 
appropriate address decoder. For example, a primary address decoder could en¬ 
able a simple secondary decoder for 68xx peripheral devices, and a more complex 
Q related secondary decoder for simple interface circuitry. 
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CHAPTER 2 


The 6809 Microprocessor: Its 
Software 


The 6809 processor's instruction set was designed to be upwardly compatible 
with its predecessor, the 6800. Indeed many of the common instructions even 
have the same machine code; for example the operation to clear location 2000 h 
(CLR 2000h) is coded as 7F-20-00 in both cases. Notwithstanding, many new in¬ 
structions were introduced giving greater flexibility and subsuming several older 
instructions. Thus the older 6800 device could only push its Accumulators into 
the stack (i.e. PSHA and PSHB; the equivalent 6809 instruction can push any or all 
its registers in one go: for example PSHS A, B, CC, DP, X, Y, U, PC. 

As we shall see, enhancing stack-based operations facilitates the production 
of efficient high-level language code. To this end, the 6809 also features an ex¬ 
tended arithmetic functionality and a limited repertoire of 16-bit operations. Ad¬ 
ditionally, the number of available address modes was considerably enlarged, in 
particular those involving computed effective addresses. 

In this chapter I will overview the instruction set and address modes. Some ex¬ 
ample program subroutines will tie these together, and give us a base to compare 
with the 68000 MPU software introduced in Chapter 4. Detailed consideration of 
subroutines and interrupts are left to Chapters 5 and 6. 


2.1 Its Instruction Set 

Although the 6809 instruction set was designed to be upwardly compatible with 
that of the 6800, in fact the number of distinct operations was reduced from 72 
to only 59. Its increased power, of the order of 260% m, comes instead from 
the additional functionality of these instructions, the capability of using more 
registers and the extra address modes. First and second generation 8-bit MPUs, 
such as the 8080/8085 and 6800 devices, encoded all instructions as a byte-sized 
operation code (op-code). Thus no more than 256 operation-register-address 
mode combinations were possible. Third generation devices such as the Z80 
and 6809 MPUs can use two bytes for this function. Whereas the 6800 MPU has 
only 197 op-codes (out of a ma xi mum of 256), the 6809 has 1464 op-codes. As 
an example, the primary op-code for PuSH onto the System stack is 34 h, the 
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complete code for PSHS A,B,X is 34-16 h. In binary this is 0011 0100 0001 
0110, where each bit of the post-byte represents a register to be saved according 
to the format shown in Fig. 12.11 Of course the programmer normally need not be 
concerned with detail at this level; the assembler will take care of such matters. 


PC 

u 

Y 

X 

DP 

B 

A 

CC 


Push order 


Figure 2.1 Postbyte for pushing and pulling. 


Typically around 40% of instructions at machine-code level involve shuffling 
data in-between registers and out to memory 0, so we will look first of all at 
data movement instructions, as summarized in Table [77T1 The Load and Store 
operations copy data between memory and register. Both 8- and 16-bit moves are 
possible, but as memory is addressable only one byte at a time, the latter move in¬ 
volves two consecutive transfers. Thus the instruction LDX OClOOh will perform 
as shown in Fig. l2.2l aL Note how the most significant byte (MSB) of X comes from 
the least significant memory location Cl 00 h and the least significant byte (LSB) 


from the next highest location C1 01 h , thus 


C100/r 

, ClOl/i 

MSB 

LSB 


The same order is observed when sending out multiple-byte data, for example 
STX 0C000. In general, data structures in the 6800/68000 family are ordered 
with the MSB in the lowest consecutive memory location. Some other proces¬ 
sors, such as the 808x family, are ordered with the MSB as the lowest successive 
memory location. 

Notice that no Store to Direct Page register operation exists. To set up this 
register to, say, 80 h, the sequence: 


LDA #80h 
TFR A,DP 


first places the number 80h in Accumulator_A (it could equally be B) and then 
transfers this to the DP register. This overhead is justified as the DP register 
is (or should be) rarely altered. The TransFeR instruction can move the con¬ 
tents of any 8-bit register (A,B,DP,CC) to any other, or any i6-bit register contents 
(X,Y,U,S,D,PC) to any other. The upper and lower nybbles (four bits) of the post¬ 
byte determine the source and destination register respectively, according to the 
code: 


0000 = D 0001 = X 0010 = Y 0011= U 0100= S 

0101= PC 1000 = A 1010 = B 1010 = CCR 1011= DP 


thus TFR A, DP is coded as 1 F-8Bh (post-byte 1 000 1011 fc). EXchanGe works in 
a si mi lar way between like-sized registers with the same post-byte construction. 
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Table 2.1 Move instructions. 

Flags 


Operation 

Mnemonic 

V N 

z 

c 

Description 

Exchange 






Exchanges two like-sized 

R1—R2 1 

EXC R1,R2 

• 

• 

• 

• 

register contents 

e-g- 

EXC A,B 

• 

• 

• 

• 

i — i 

CO 

A 

1 

1 

V 

< 
i_i 

Load 






Moves data to register 

to A; to B 

LDA; LDB 

0 

V 

V 

• 

[A]<-[M] ; [B]<-[M] 

to D 

LDD 

0 

V 

V 

• 

[D]<-[M:M+1] 

to X; to Y 

LDX; LDY 

0 

V 

V 

• 

[X]<-[M:M+l]; [Y]<- [M :M+l] 

to S; to U 

LDS; LDU 

0 

V 

V 

• 

[S] <- [M : M+l] ; [U] <- [M : M+l] 

Push 






Moves registers onto Stack 

to System stack 

PSHS regs 

• 

• 

• 

• 

Listed registers to S stack 

to User stack 

PSHU regs 

• 

• 

• 

• 

Listed registers to U stack 

e-g- 

PSHS A,B,X 

• 

• 

• 

• 

A,B and X to S stack 

Pull 






Moves stack data to registers 

from System stack 

PULS regs 

• 

• 

• 

• 

S stack to listed registers 

from User stack 

PULU regs 

• 

• 

• 

• 

U stack to listed registers 

e-g- 

PULS A,B,X 

• 

• 

• 

• 

S stack to A,B and X 

Store 






Moves data from register 

from A; from B 

STA; STB 

0 

V 

V 

• 

[M] <-[A] ; [M] <- [B] 

from D 

STD 

0 

V 

V 

• 

[M : M+l] <- [D] 

from X; from Y 

STX; STY 

0 

V 

V 

• 

[M : M+l] <- [X] ; [M:M+1]<-[Y] 

from S; from U 

STS; STU 

0 

V 

V 

• 

[M:M+1]<-[S] ; [M:M+1]<-[U] 

Transfer 






Transfers two like-sized 







register contents 

R1—R2 1 

TFR R1,R2 

• 

• 

• 

• 


e-g- 

TFR A,DP 

• 

• 

• 

• 

[DP] <— [A] 


0 Flag always reset 

1 Flag always set 

• Flag not affected 

V Flag operates in the normal way 

Note 1: Register pairs must either be 8-bit A,B,CC,DP or 16-bit X,Y,S,U,PC. 


The programmer can easily keep two separate stacks using the System Stack 
Pointer and User Stack Pointer registers. These stacks are normally set up at the 
beginning of the program, simply by using the relevant Load operation. Thus if 
we wish to define RAM from 1 FFFh downwards as the System stack and 1 8FFh 
downwards as a User stack, the sequence: 

LDS #02000h 
LDU #01900h 


will accomplish this. Notice that the Top Of Stack (TOS) in both cases is one above 
physical memory. This is because the Push and Pull operations, as well as the 






22 C FOR THE MICROPROCESSOR ENGINEER 


OCIOOh 


IX 



ClOOh 


ClOlh 

M 

M+1 


© 

© 

IXH 

IXL 


OCIOOh 


D 


ClOOh 


ClOlh 

M 

M+1 

© 


© 

A 

B 


(a) LDX OCIOOh; X^M;M+1 (b) STD OCIOOh; M;M+1^D 


Figure 2.2 Moving 16-bit data at one go'. 


system operations of jumping to a subroutine and implementing an interrupt, 
decrement the relevant Stack Pointer before moving data. As mentioned earlier, 
the Push and Pull operations allow any register or set of registers to be pushed or 
pulled into or out of a stack at one go. This facilitates the passing of arguments 
to and from subroutines, and allows called subroutines to use registers without 
corrupting register-held data in the calling program (see Section 5.2). 

Figure EH] shows how the post-byte is calculated for a Push or a Pull. Specif¬ 
ically the System stack is shown; if the User stack is being employed then U is 
replaced by S. Figure COI shows a snapshot of memory after a Push onto the Sys¬ 
tem stack. If only a subset of registers are saved, then the same order is preserved 
as in the diagram. The time-taken for a Push or Pull is five cycles plus one cycle 
per byte moved. In Fig. l2.3l this adds up to f 7 cycles. 

The 6809 implements the normal Add and Subtract operations, as shown in 
Table 12.21 both with and without carry, targeted on an 8-bit Accumulator. An 
Accumulator_D-based 16-bit Add and Subtract instruction is also provided, but 
unfortunately not with a carry. An unsigned addition of Accumulator_B to the 
16-bit X Index register can also be classed as double, but the 8-bit addend is 
promoted to 16-bit at addition time, by assuming an upper byte of zero, hence the 
terminology unsigned. Thus for example, ABX #56h actually adds the constant 
0056h to X. 

It is possible to promote a signed number in Accumulator_B to its 16-bit equiv¬ 
alent in Accumulator_D by using the Sign Extension instruction. This zeros 
Accumulator_A if bit 7 of B is 0 and fills A with ones (A <- FFh) otherwise; for 
example [B] = 1011 0011 b (-83) becomes [D] = 11111111 1 01 1 001 1 b (-83). 
The Sign Extension (SEX) instruction makes the 6809 unique as the only MPU 
offering sex appeal! 

Any 16-bit Index or Stack register can be summed with an 8-bit Accumulator 
(which is automatically sign extended), Accumulator_D or a constant by means of 
the Load Effective Address (LEA) instruction. This makes use of the arithmetic 
provision which computes effective addresses in the Indexed address mode. We 
will discuss this in the next section, but as an example the instruction: 

LEAX 1,X ; Coded as 30-01h 













Push sequence 
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LDS #02000h 


Before. 


SSPI 


TOS 


After 


SSPI 


PSHS A,B,X,Y,U, 

DP, CC 

PCL 

1FFF 

PCH 

1FFE 

UL 

1FFD 

UH 

1FFC 

YL 

1FFB 

YH 

1FFA 

XL 

1 FF9 

XH 

1 FF8 

DPR 

1 FF7 

B 

1 FF6 

A 

1 FF5 

CCR 

1 FF4 

1 FF3 

s 

1 FF2 


\ 




< 


< 


/ 


Program Counter contents 


User Stack Pointer * 


Y Index register contents 


X Index register contents 


Direct Page register contents 

B Accumulator contents 


A Accumulator contents 


Code Condition register contents 


* System Stack Pointer for a 
Push/Pull into the User Stack. 


Figure 2.3 Stacking registers in memory using PSH and PUL. Also applicable to IRQ and NMl interrupts. 
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Table 2.2 Arithmetic operations 


Flags 


Operation 

Mnemonic 

V 

N 

z 

c 

Description 

Add 

to A; to B 

ADDA; ADDB 

V 

V 

V 

V 

Binary addition 

[A] <- [B] + [M] ; [B]<- [B] + [M] 

to D 

ADDD 

V 

V 

V 

V 

[D] <- [D] + [M: M+l] 

B to X 

ABX 

• 

• 

• 

• 

[X] <- [X] + [00 | B] 

Add with Carry 

to A; to B 

ADCA; ADCB 

V 

V 

V 

V 

Includes carry 

[A]<-[A] + [M]+C; [B]<-[B] + [M]+C 

Clear 

memory 

CLR 

0 

0 

1 

0 

Destination contents zeroed 
[M]<-00 

A; B 

CLRA; CLRB 

0 

0 

1 

0 

o 

o 

1 

V 

1 — 1 

CQ 
i_i 

o 

o 

1 

V 

1—1 
< 

Decrement 

memory 

DEC 

l 

V 

V 

• 

Subtract one, produce no carry 

A; B 

DECA; DECB 

l 

V 

V 

• 

[A] <- [A] -1; [B]<-[B]-1 

Increment 

memory 

INC 

2 

V 

V 

• 

Add one, produce no carry 
[M]<-[M]+1 

A; B 

INCA; INCB 

2 

V 

V 

• 

[A]<-[A]+l; [B]<-[B]+l 

Load Effective Add 

X; Y 

ress 

LEAX; LEAY 

• 

• 

V 

• 

Effective Address to register 
[X]<-EA; [Y]<-EA 

S; U 

LEAS; LEAU 

• 

• 

• 

• 

[S]<-EA; [U]<-EA 

Multiply 

MUL 

• 

• 

V 

3 

Multiplies [A] by [B] 

[D] <- [A] x [B] 

Negate 

memory 

NEC 

4 

V 

V 

5 

Reverses 2's complement sign 
[M]<- -[M] 

A; B 

NEGA; NECB 

4 

V 

V 

5 

1—1 

CQ 

i _ i 

1 

1 

V 

i—i 

CQ 

i _ i 

i—i 
< 
i _ i 

1 

1 

V 

i—i 
< 

Sign Extend 

SEX 

• 

V 

V 

• 

Promotes signed B to signed D 
[D]<-00| [B] or [D]<-FF|[B] 

Subtract 

from A; from B 

SUBA; SUBB 

V 

V 

V 

V 

Binary subtraction 

[A]<- [A] — [M] ; [B] <- [B] - [M] 

from D 

SUBD 

V 

V 

V 

V 

[D] <- [D] - [M : M+l] 

Subt with Carry 

from A; from B 

SBCA; SBCB 




V 

Includes carry (borrow) 

[A] <- [A] - [M] -C ; [B]<- [B] - [M] -C 


Note 1: Overflow set when passes from 10000000 to 011 1 1 1 11, i.e. an apparent sign change. 
Note 2: Overflow set when passes from 01111111 to 10000000, i.e. an apparent sign change. 
Note 3: Carry set to state of bit 7 product, i.e. MSB of lower byte; for rounding off. 

Note 4: Overflow set if original data is 1 0000000 (-128), as there is no +128. 

Note S: Carry set if original data is 00000000; for multiple-byte negation. 
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calculates the effective address as [X] + 1 and loads it into the X Index register 
([X] <- [X] + 1); thus it is the equivalent to an increment X (INX) instruction, 
which is missing from the 6809's repertoire. Much more powerful permutations 
of LEA exist, thus: 

LEAY A,X ; Coded as 31-96h 


promotes a signed number in Accumulator_A to 16-bits, adds this to the con¬ 
tents of the X Index register and puts the result in the Y Index register ([Y] <- 
SEX | [A] + [X])! 

The contents of any read-write memory location, or any 8-bit Accumulator can 
be directly incremented or decremented by using the INC or DEC instruction. As 
noted, the X,Y,S,U registers can be similarly augmented by using the LEA instruc¬ 
tion. Notice that INC and DEC do not set the Carry flag, which makes multiple-byte 
Increment and Decrement operations awkward (use ADD #1 and SUB #1 instead). 
Increment sets the overflow flag when the target goes from 0,lllilllh through 
to 1,0000000b (seemingly from + to -) and Decrement likewise when going from 
1,0000000b through to 0,11 11 1 1 1 b (- to +). INC and DEC on memory are classi¬ 
fied as read-modify-write operations, as during execution, data is fetched from 
memory, modified and then sent back. Clearing (CLR) memory strangely works 
in the same way — although the original value is irrelevant. 

It is possible to multiply the two 8-bit Accumulator contents using the MUL in¬ 
struction, giving a 16-bit product overwriting the original contents of Accumula- 


tor_D; thus 


x 


B 


leads to 


B 


For this purpose 


the multiplier and multiplicand are treated as unsigned. The 16-bit product may 
be truncated by using only the contents of Accumulator_A as the outcome, effec¬ 
tively dividing by 256 (equivalent to moving the binary point left eight places). 
Instead of truncating, this 8-bit product may be rounded off by adding the MSB 


of Accumulator_B to Accumulator_A, in effect adding the |bit. To facilitate this, 
MUL sets the C flag to the state of bit 7 of B. Thus the sequence: 


MUL ; Multiply [A] and [B] giving a 16-bit product as [D] 

ADCA #0 ; Add Carry to [A] (now can disregard contents of B) 


would give the required rounded 8-bit product in Accumulator_A. 

It is of course possible to multiply or divide by powers of two by shifting left 
or right as appropriate. Also a combination of shift and add or shift and subtract 
can be used to multiply or divide by any number [3]. Table [2731 gives the range of 
Shift instructions available. All of these operate on an 8-bit Accumulator or on 
any read/write memory location through the read-modify-write mechanism. 

Linear Arithmetic Shift instructions move the 8-bit operand left or right with 
the Carry flag catching the emerging bit. In the case of ASR, the sign bit propagates 
right; thus 7,1 1 1 01 00b (-12) becomes 7,7111 01 0 b (-6) - 1 ,77111 01 b (-3) etc. 
and 0,0001 100b (+12) becomes 0,0000110 (+6) - 0,000001 lb (+3) etc. The 
Logic Shift Right equivalent always shifts in zeros from the left. Logic Shift 
Left and Arithmetic Shift Left are equivalent, and some assemblers permit 
the use of the alternative LSL mnemonic. 
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Table 2.3 Shifting Instructions. 


Operation 


Flags 

Mnemonic V N Z C 


Description 


Shift left, arithmetic or logic 






memory 

ASL 

1 

V 

V 

b7 

A; B 

ASLA; ASLB 

1 

V 

V 

b7 

Shift right, logic 






memory 

LSR 

• 

V 

V 

bo 

A; B 

LSRA; LSRB 

• 

V 

V 

bo 

Shift right, arithmetic 






memory 

ASR 

• 

V 

V 

bo 

A; B 

ASRA; ASRB 

• 

V 

V 

bo 

Rotate left 






memory 

ROL 

1 

V 

V 

b7 

A; B 

ROLA; ROLB 

1 

V 

V 

b7 

Rotate right 






memory 

ROR 

• 

V 

V 

bo 

A; B 

RORA; RORB 

• 

V 

V 

bo 


Linear shift left into carry 
C 


Linear shift right into carry 
0 


0 


As above but keeps sign bit 


b7 

— 










*0 


Circular shift left into carry 
C 


Circular shift right into carry 
0 - 


Note 1: V=b 7 ©b 6 before shift. 

Circular or Rotate Shift instructions are similar to Add with Carry, in that they 
can be used for multiple-precision operations. A Rotate takes in the Carry from 
any previous Shift and in turn saves its ejected bit in the C flag. As an example, 
a 24-bit word stored in 


24 M 

right once by the sequence E): 


16 


15 


M+l 


M+2 


can be shifted 


LSR 

M 

; o - 

=> 

M 

ROR 

M+l 

; b )6 / c — 

=> 

M+l 

ROR 

M+2 

; b 8 / c - 

=> 

M+2 


t>l 6 

b s 

b 0 



In all types of Left Shifts, the overflow flag is set when bits 7 and 6 differ 
before the shift (i.e. b 7 ©b 6 ), meaning that the (apparent) sign will change after 
the shift. 

The logic operations of A ND, OR, Exclusive-OR and NOT (Complement) are 
provided, as shown in Table 12.41 The only unusual feature here is the special 
instructions of ANDCC and ORCC for clearin g or setting flags in the Code Condition 
register. Thus to clear the I mask (see Fig. ll.lt we have: 



































































































ITS INSTRUCTION SET 27 


ANDCC #11101111b ; Coded as lC-EFh (equivalent to CLI) 

and to set it: 

ORCC #00010000b ; Coded as lA-lOh (eqivalent to SEI) 

This saves having to provide a series of separate instructions targeted at each 
of the CCR flags and masks, such as the 6800's CLI and SEI (CLear and SEt 
Interrupt mask), and also allows more than one flag to be set or cleared in a 
single instruction. 


Table 2.4 Logic instructions. 
Flags 


Operation 

Mnemonic 

V N 

z 

c 

Description 

AND 






Logic bitwise AND 

A; B 

ASL 

0 

V 

V 

• 

[A]<-[A]-[M]; [B]<-[B]■[M] 

CC 

ANDCC #nn 

C; 

m cle 

ar 

[CCR]<-[CCR]-#nn 

Complement 






Invert (l's complement) 

memory 

COM 

0 

V 

V 

i 

[M]<- [M] 

A; B 

COMA; COMB 

0 

V 

V 

i 

[A] <- [A] ; [B]<-[B] 

Exclusive-OR 






Logic bitwise Exclusive-OR 

A; B 

EORA; EORB 

0 

V 

V 

• 

[A] <- [A] © [M] ; [B] <- [B] ffi [M] 

OR 






Logic bitwise Inclusive-OR 

A; B 

ORA; ORB 

0 

V 

V 

• 

[A] <- [A] + [M] ; [B]<- [B] + [M] 

CC 

ORCC #nn 

Can set 

[CCR]<-[CCR]+#nn 


The setting of the CCR flags can be used after an operation to make some 
deduction about, and hence act on, the state of the operand data. Thus, to deter¬ 
mine if the value of a port located at, say, 8080 h is zero, then: 

LDA 8080h ; Move in data & set Z & N flags as appropriate {86-80-80h} 

BEQ SOMEWHERE ; Go somewhere if Z flag EQuals zero {27-xxh} 

will bring its contents into Accumulator_A and set the Z flag if it is zero. Branch 
if EQual to zero will then cause the program to skip to another place. The 
N flag is also set if bit 7 is logic 1, and thus a Load operation can enable us to 
test the state of this bit. The problem is, loading destroys the old contents of the 
Accumulator, and the new data is probably of little interest. A non-destructive 
equivalent of loading is TeST, as shown in Table [23] The sequence now becomes: 

TST 8080h ; Check data & set Z & N flags as appropriate {7D-80-80h} 

BEQ SOMEWHERE ; Go somewhere if Z flag EQuals zero {27-xxh} 

but the Accumulator contents are not overwritten. However, 16-bit tests must 
be carried out using a 16-bit Load operation as only 8-bit TeST instructions are 
provided. 

TeST can only check for all bits zero or the state of bit 7. For data already in 
an 8-bit Accumulator, ANDing can check the state of any bit; thus: 
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Table 2.5 Data test operations. 
Flags 


Operation 

Mnemonic 

V 

N 

z 

c 

Description 

Bit Test 

A; B 

BITA; 

BITB 

0 

V 

V 

• 

Non-destructive AND 
[A]-[M] ; [B]-[M] 

Compare 

with A; B 

CM PA; 

CMPB 

V 

V 

V 

V 

Non-destructive subtract 
[A] - [M] ; [B] - [M] 

with D 

CMPD 


V 

V 

V 

V 

[D] — [M:M+1] 

with X; Y 

CMPX; 

CMPY 

V 

V 

V 

V 

[X] - [M: M+l] ; [Y] — [M : M+l] 

with S; U 

CMPS; 

CMPU 

V 

V 

V 

V 

[S] — [M:M+1] ; [U] — [M:M+l] 

Test for Zero or Minus 

memory 

TST 


0 

V 

V 

• 

Non-destructive subtract from zero 
[M] -00 

A; B 

TSTA; 

TSTB 

0 

V 

V 

• 

O 

o 

1 

CO 

O 

O 

1 

i—i 
< 


ANDB #00100000b ; Clear all Accumulator B bits except 5 {C4-20h} 

will set the Z flag if bit 5 is 0, otherwise Z will be cleared. Once again this is a 
destructive examination, and the equivalent from Table [7751 is BIT test; thus: 

BITB #00100000b ; Coded as C5-20h 

does the same thing, but with the contents of Accumulator_B remaining un¬ 
changed; and more tests can subsequently be carried out without reloading. 

Comparison of the magnitude of data in an Accumulator with either a constant 
or data in memory requires a different approach. Mathematically this can be 
done by subtracting [M] from [A] and checking the state of the flags. Which 
flags are relevant depend on whether the numbers are to be treated as unsigned 
(magnitude only) or signed. Taking the former first gives: 

[A] Higher than [M] : [A]—[M] gives no Carry and non-Zero C=0, Z=0 (C + Z=1) 

[A] Equal to [M] : [A]—[M] gives Zero (Z=l) 

[A ] Lower than [M] : [A]—[M] gives a Carry (C=l) 

The signed situation is more complex, involving both the Negative and over¬ 
flow flag. Where a subtraction occurs and the difference is positive, then either 
bit 7 will be 0 and there will be no overflow (both N and V are 0) or else an overflow 
will occur with bit 7 at logic 1 (both N and V are 1 ). Logically, this is detected by 
the function N®V. A negative difference is signalled whenever there is no over¬ 
flow and the sign bit is 1 (N is 1 and V is 0) or else an overflow occurs together 
with a positive sign bit (N is 0 and V is 1 ). Logically, this is N®V. Based on these 
outcomes we have: 

[A] Greater than [M] : LA]—[M] — non-zero +ve result (N®V-Z = 1 or N«V+Z = 0) 

[AJ Equal to [M]: LA]—[M] — zero (Z=l) 

[A] Less than [M]: LA]—[M] — a negative result (N®V = 1) 

Subtraction is a destructive test operation and Comparison is its non-destructive 
counterpart. It is the most powerful of the Data Testing operations, as it can be 
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applied to both Index and Stack Pointer registers as well as 8- and 16-bit Accu¬ 
mulators. 


Table 2.6 Operations which affect the Program Counter. 


Operation 

Mnemonic 

Description 


Bcc 


cc is the logical condition tested 

LBcc 




Always (True) 

BRA; LBRA 

Always affirmed regardless of flags 

Never (False) 

BRN; LBRN 

Never carried out 


Equal 

BEQ; LBEQ 

Z flag set 

(Zero result) 

not Equal 

BNE; LBNE 

Z flag clear (Non-zero result) 

Carry Set 

BCS; LBCS 1 

[Acc] Lower Than 

(Carry = 1) 

Carry Clear 

BCC; LBCC 2 

[Acc] Higher or Same as 

(Carry = 0) 

Lower or Same 

BLS; LBLS 

[Acc] Lower or Same as 

(C+Z=l) 

Higher Than 

BHI; LBHI 

[Acc] Higher Than 

(C+Z=0) 

Minus 

BMI; LBMI 

N flag set 

(Bit 7=1) 

Plus 

BPL; LBPL 

N flag clear 

(Bit 7 = 0) 

Overflow Set 

BVS; LBVS 

V flag set 


Overflow Clear 

BVC; LBVC 

V flag clear 


Greater Than* 

BCT; LBGT 

[Acc] Greater Than (N ® V ■ Z = 1) 

Less Than or Equal* 

BLE; LBLE 

[Acc] Less Than or Equal (N © V ■ Z = 0) 

Greater Than or Equal* 

BCE; LBGE 

[Acc] Greater Than or Equal (N © V = 1) 

Less Than* 

BLT; LBLT 

[Acc] Less Than 

(N © V = 0) 

Jump 

JMP 

Absolute unconditional goto 

No Operation 

NOP 

Only increments Program Counter 


* 2's complement Branch 

Note 1: Some assemblers allow the alternative BLO. 
Note 2: Some assemblers allow the alternative BHS. 


All Conditional operations in the 6809 are in the form of a Branch instruction. 
These cause the Program Counter to skip xx places forward or backwards; usu¬ 
ally based on the state of the CCR flags. Excluding Branch to Subroutine (see 
Section 5.1), there are 16 Branches provided, which can be considered as the True 
or False outcome of eight flag combinations. Thus Branch if Carry Set (BCS) 
and Branch if Carry Clear (BCC) are based on the one test (C =?). 

If the test is True, the offset following the Branch op-code is added to the 
Program Counter. Thus if the Carry flag is zero: 

E100:1 BCC-08 ; Coded as 24-08h 
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will add 0008/7 to the Program Counter state El 02h to give PC = El OA/i. Note 
that the PC is already pointing to the following instruction when execution occurs, 
giving an effective destination of ten places on from the Branch location. The 
Branch offset is sign extended before addition to the Program Counter; thus if 
the N flag is zero: 

E100:1 BPL-F8 ; Coded as 24-F8h 

gives PC<-E1 02/r + FFF 8/7 = EOFA/r, which is eight places back (six places back 
from the Branch itself). With such a single signed-byte offset, the maximum range 
is only +125 and -129 bytes. 

Each 6809 Branch has a long equivalent which uses a double-byte offset. Thus 
the Conditional Branch: 

E100:1:2:3 BCC-100F ; Coded as 10-24-10-OFh 

if true forces PC to El 04/7 + 1 OOF/r = FI 1 3 h. 

Long Branches can skip to anywhere in the 64 kbyte memory space, but oc¬ 
cupy more room and take longer to execute. A normal Branch requires 3 cycles, 
whereas a Long Branch takes 6 cycles if carried out and 5 if not. Except for Long 
BRanch Always (LBRA), the op-code has a 1 Oh byte fronting the normal Branch 
op-code; thus occupying four memory bytes. LBRA is exceptional, in that it has a 
special op-code of 16 h, giving a 3-byte instruction always taking 5 cycles. Using 
a Long BRanch Always instead of a Jump is useful for position independent 
code (PIC); as by definition, the offset is relative to the Program Counter, the 
absolute destination being irrelevant. This is convenient where the program is 
to run in ROM which may be based anywhere in memory space. A plain Jump 
can only be made to an absolute location, which by defination cannot be altered 
unless the ROM is reprogrammed. 

Although Long Branches will cope with all destinations, where possible Short 
Branches should be used for efficiency. However, it can be difficult sometimes to 
predict whether a destination is within range. Some assemblers will choose for 
you at assembly time if advised accordingly, although they are unlikely to choose 
the Short Branch in all legal situations. 

The remaining instruction in Table EH is No Operation. NOP does just this, 
and as a consequence the fetch increments the Program Counter, taking 2 cycles 
to do it. NOPs are normally used in situations where a do-nothing delay is nec¬ 
essary. BRanch Never (BRN) is effectively a 2-byte NOP with a 3-cycle delay and 
LBRN takes up 4 bytes for a 5-cycle delay. 

Table l2.7l summariyes the instruction set and address modes of the 6809 fam¬ 
ily of microprocessors. 


2.2 Address Modes 

Virtually all instructions act on data; either outside the processor in its mem¬ 
ory space, or in an internal register. Thus the op-code must include bits which 
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Table \2J\ (a) The M6809 instruction set (continued next page). 


Insert page 1 of Table 2.7 here. 
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Table f2T7l (b) The M6809 instruction set (continued next page). 


Insert page 2 of Table 2.7 here. 
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Table 2.7 (c) (continued! The M6809 instruction set. Reproduced by courtesy of Motorola Semicon¬ 
ductor Products Ltd. 


Insert page 3 of Table 2.7 here. 
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inform the MPU's Control registers where this data is being held. There are a 
few exceptions to this, the so called Inherent operations, such as NOP (No Op¬ 
eration) and RTS (Return from Subroutine). Single-byte instructions whose 
operand is a single register, for example INCA (INCREMENT accumulator A), are 
also sometimes classified as Inherent. 

With the exception of Inherent instructions, the bytes following the op-code 
are either the (constant) operand itself, or more usually a pointer to where the 
operand can be found. We have already met the simplest of these, where the 
absolute address itself follows, as in: 

LDA 2000h ; [A] <- [2000] {Coded as B6-20-00h} 

Absolute addressing is rather inflexible, as the address is fixed as part of 
the program, and this must be allocated by the programmer. One of the most 
important features of a processor is its range of address modes, that is different 
techniques for evaluating the operand address. To see why this is important, 
consider, say, the problem of adding the constant 30 h to each element of an 
array of 256 data bytes stored consecutively between 2000 h and 20FFh. If we 
had only absolute addressing, the routine would look something like the listing 
in Table lZ8l a). which is a pity because the same action is repeated 256 times, and 
takes 2048 bytes of program memory. 

An alternative strategy is to use an address mode where the address is stored 
in a register which can be incremented, and fold our program into a loop as 
shown in Table nTSt b). This only takes 16 bytes, less than 1% of the absolute 
version. Furthermore, the array can be of any length without increasing the size 
of the program. However, there is a penalty to pay for this flexibility. The more 
complex address modes take longer to execute (see Table I2.7f cl under ~), and 
the loop construct has the Test and Branch overhead. Thus, the absolute array 

Table 2.8 Initializing a 256-byte array. 


BEGIN: 

LDA 

2000h 

Get array[0] 



ADDA 

#30h 

Add the constant (#) 30h 



STA 

2000h 

Restore it 



LDA 

2001h 

Get array[l] 



ADDA 

#30h 

Add the constant 30h 



STA 

2001h 

Restore it 



LDA 

2002h 

Get array[3] 



■' 

" 

and so on 



LDA 

20FFh 

Get array[255] 



ADDA 

#30h 

Add the constant 30h 


END: 

STA 

20FFh 

Restore it (phew!) 


(a) Linear coding. 



BEGIN: 

LDX 

#2000h 

Point IX to array [0] 


; While address less than 2100h add 30h to the contents 

of that address 

LOOP: 

LDA 

,X 

Get array [IX] 



ADDA 

#30h 

Add the constant 30h 



STA 

,X+ 

Put it away at [IX] and increment 

pointer 


CMPX 

#2100h 

Check for past array [256] 



BNE 

LOOP 

and repeat if not 



END: 

(b) Equivalent circular mode. 
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program would take 3072 cycles, whilst the loop equivalent 
longer at 4867 cycles to execute. 

In the re mainder of thi s section, we will look at the 6809 
this catalog, op-code may be one or two bytes. 


takes considerably 
address modes. In 


Inherent 

op-code 


All the operand information is contained in the op-code, with no specific address- 
related bytes following. All of the 6809 inherent operations are one byte long 
except Software Interrupt 2 . An example is NOP (No Operation). Motorola 
also classify most Register-Direct instructions as inherent, for example INCA (IN¬ 
CREMENT A). Table [2771 gives the Inherent instructions. 

Register Direct, £R 


op-code 


post-byte 


Information concerning the source register(s) and/or destination register(s) are 
contained in a post-byte. For example TFR A, B (TransFeR the contents of A 
to B) is coded as 0001 11111 000 1 001 b (1 F-89/t). The post-byte here is divided 
into two fields. The left field specifies the source register, and the right the 
destination. Each register is encoded as a bit in a 4-wide code. Thus 1 000b is A 
and 1 001 b is B. A list of codes is given on page[20] The Transfer, Exchange, Push, 
and Pull operations come under this category. In Table 12771 these are classified as 
Immediate. 


Immediate, #kk 


op-code 


constant 


8 bit 


op-code 


constant 


16 bit 


With Immediate addressing, the byte or bytes following the op-code are constant 
data and not a pointer to data. We have used this form of addressing before, in 
the array argument routine in Table [2781 Some examples are: 


ADDB #30h 
LDX #2000h 
CMPY #21FFh 


Add the constant 30h to Acc. B {Coded as 
Put the constant 2000h in X {Coded as 
Compare [Y] with the constant 21FFh {Coded as 


CB-30h} 

8E-20-00h) 

10-8C-21-FFh} 


The pound (hash) symbol # is commonly used to indicate a constant number. 


Absolute, M 
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op-code 


DP offset 


Short (Direct) 


op-code 


Address 


Long (Extended Direct) 


In Absolute addressing, the address itself — either in whole or part — follows 
the op-code. Motorola terms the long 16-bit address version as Extended Direct. 
There is a short version just called Direct, where the effective address (ea) is the 
concatenation of the Direct Page register with the byte following the op-code. 
Thus if this register is set at, say, 80h, then the instruction LDA 08h, coded as 
96-08 h, effectively brings down the byte from address 8008h. Some assem¬ 
blers have difficulty in deciding which of these forms to use. For example, in the 
fragment above, should the assembler generate the code B6-80-08 (LDA 8008) or 
96-08 (LDA 08)? After all, the setting of the DP register may have been altered in 
a call to a subroutine yet to be linked in. There are ways around this, but none is 
entirely satisfactory. 


Absolute Indirect, [M] 


op-code | 9Fh 


Pointer to address 


Here the op-code is followed by a post-byte 9F h and then a 16-bit address. This 
is not the address of the operand but a pointer to where the operand address is 
stored in memory. Thus, if the locations 2000:2001 h hold the address 80-08 h, 
then the instruction: 

LDA [2000h] ; [A] <- [[2000:2001]] {Coded as AF-9F-20-00h} 

effectively fetches the data down from 2000 h and then 2001 h, puts them to¬ 
gether as a 16-bit address and sends this address out on the address bus to fetch 
the data into Accumulator_A. Although the location in memory of this pointer 
address is absolute, the pointer residing there can be altered as the program 
progresses. 

As an example, consider the problem of implementing a subroutine (see Chap¬ 
ter 5) which will process in some way the contents of an array of data. Rather 
than passing each element of the array to the subroutine it makes sense to send 
only the address or pointer to the first element. This can be done by using an 
absolute address, say 2000:2001 h, to store the pointer prior to jumping to the 
subroutine. The subroutine can then use this pointer as a sort of base address 
to access any element of the array relative to this location. 

As this indirect address is at an absolute location, this address mode is only 
slightly more flexible than the ordinary absolute modes. However, indirection 
can be used in conjunction with the Indexed addressing modes discussed below. 
As in the absolute case, the effective address is in fact only the address of a 
pointer to the data and not the data itself. 
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Branch Relative 


op-code 


offset 


8-bit (Short) 


op-code 


offset 


16-bit (Long) 


We have already discussed this form of address mode in the previous section. 
Regular (or short) Branches sign extend the following 8-bit offset, and add this to 
the Program Counter. Effectively this means that offsets between 80/r and FF/r 
are treated as negative. For example the instruction BRA -06 is coded as 20-FA/i 
(FA/i is the 2's complement of 06 h) when the PC is at El 08h, is implemented as: 


1110 0001 0000 1000 (PC) = El 08/i 

+ mi mi mi 1010 (offset) = fffa/i =-6 

/I 1110 0001 0000 0010 (El 02/i, which is El 08/i - 0006h) 

In calculating this offset, it must be remembered that the PC is already point¬ 
ing to the next instruction. Thus the maximum forward point is (00)7F/i + 2 = 
127 + 2 = 129 bytes from the op-code and (FF)80/i + 2 = -128 + 2 = 126 bytes 
back. Long Branches have a 16-bit offset and can range from+32,767 and -32,768 
bytes from the following op-code, effectively anywhere in the full 64 kbyte ad¬ 
dress space of memory that the processor can address at one time. Of course 
Long Branch code is bigger and slower to execute (see Table 12.71 c) under the 
column ~). 


Indexed 

The Absolute address modes are used where operands he in fixed locations. In 
many cases, this places an unacceptable restriction on the data structures which 
can easily be processed. Compilers, for example, like to pass parameters in a 
stack, and these should then be capable of being retrieved in locations relative 
to the Stack Pointer. The 6800 MPU has a primitive form of computed effective 
address (ea), where this could be up to +FF h (+255) bytes from the contents of 
one Index register thus: 

LDAA 8,X ; [A] <- [X] + 8 

means that if X is 8000/t at the time of execution, then 8008 h is the ea of the 
data brought down to Accumulator_A. The 6809 has an additional complement of 
Index registers (X, Y, S, U and sometimes the PC), as well as an extended repertoire 
of offsets. Constant offsets of up to ±2 15 are now possible, and Accumulator_A, 
_B or _D can act as a variable offset. In addition, automatic incrementation or 
decrementation submodes are possible. A level of indirection is also provided 
for most combinations. Table [777t c) summarizes the submodes, which are coded 
as an op-code followed by a post-byte. Notice that Absolute Indirect is part of 
this table, although strictly it is not an Indexed address mode. 


Constant Offset from Register 


op-code 


post-byte 


0, R or , R 
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op-code post-byte±n 


op-code post-byte 


±n 


op-code post-byte 


±n 


± n, R (5-bit) 
±n,R (8-bit) 
±n, R (16-bit) 


Here the effective address is R ± n where R is X, Y, S or U. The actual machine 
code produced depends on the size of n, with a single post-byte capable of in¬ 
tegrally handling up to ±15. This complex encoding scheme is worthwhile, as 
most offsets are small; for example, an analysis has shown that 40% of this type 
of indexing uses a zero offset |T|. Indirect Constant Offset Index does not have 
an 8-bit (±127) offset version, the 16-bit variety being used. Fortunately the task 
of evaluating the post-byte and following bytes is handled automatically by the 
assembler. 


Post-Auto-Increment / Pre-Auto-Decrement from Register 


op-code post-byte 


,R+ / ,R++ / ,-R / 


As we saw in the listing of Table l2.8l b). indexing comes into its own when stepping 
through blocks of memory, arrays and related structures. To avoid having to 
follow (or lead) the use of the Index register with an Increment or Decrement, 
this mode provides for automatic advance or retard; thus: 


LDA 

, R+ 

Bring down 

data 

byte and 

LDA 

,-R 

Bring down 

data 

byte and 

LDA 

, R++ 

Decrement 

Index 

register 

LDA 

,-R 

Decrement 

Index 

register 


then increment Index register R 
then increment Index register R twice 
R and then bring down data byte 
R twice and then bring down data byte 


where R is X, Y, S or U. Notice that incrementing is done after and decrementing 
before the Index register is used. Double Increment/Decrement modes are useful 
when the arrays contain addresses or other double-byte data. Indirection is only 
available for this double form, as by its nature addresses are likely to be being 
accessed. 

As an example of these modes, consider the problem of multiplying two 256- 
byte arrays to give a 256 double-byte array. If array_l begins at 2000 h with 
the second array following directly, and the product array commences at 3000 h, 
then we have: 


LDX 

#2000h 

Point IX to array_l[0] 

LDY 

#3000h 

Point IY to array_3[0] 

LDA 

256,X 

Get array_2[i] 

LDB 

, X+ 

Get array_l[i]; increment i 

MUL 


Multiply them 

STD 

, Y++ 

Put it away and move on twice 

CMPX 

#21FFh 

Last element yet? 

BLS 

LOOP 

IF not past it THEN repeat 

RTS 


ELSE finished 
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Accumulator Offset from Register, A,R / B,R / D,R 


op-code 


post-byte 


As an alternative to a constant offset, any Accumulator can hold a variable offset 
to an Index register, for example: 


LDA 

B,X 

; [A] <- 

[SEX 

[ B] + [X] ] 

LDB 

A, Y 

; [B] <- 

[SEX 

[A] + [Y] ] 

LDX 

D, U 

; [X] <- 

[ [D] + 

- [U] ] : [ [D] + [U] +1] 


Note that the value of the 8-bit Accumulator is sign extended before the addition, 
giving a range of+127 to -128. Thus if Bis FEh, then FFFEh is added to theX Index 
register in the first example above to give the effective address. Of course, FFFE/t 
is effectively -2, so the target memory location is actually X - 2. If this is not 
desirable, Accumulator_A maybe cleared and D used as the offset, e.g.: 

CLRA 

LDA D,X ;[A] <- [00|[B]+[X]] 

and this allows an offset of up to +255 (FFh) in Accumulator_B. 

The use of an Accumulator allows the offset to be dynamically calculated as 
the program runs. A typical example is listed below, where we require access to 
one of a table (array) of ten elements, actually the 7-segment code. The requested 
element is already in the Accumulator_B (the decimal number 0-9), and it is to 
be replaced with the 7-segment equivalent code on exit. We are assuming that 
the subroutine starts at E200h. 


E200/1/2 8E-E2-06 

E203/4 E6-85 

E205 39 


LDX #TABLE_B0T 

LDB B,X 

RTS 


Point X to table 
Get element [B] 
Exit 


; Table of 7-segment codes begins here 

E206-E20A 01-4F-12-06-4C TABLE_B0T: .BYTE 1,4Fh,12h,6,4Ch 

E20B-E20F 24-20-0F-00-0C .BYTE 24h,20h,OFh,0,OCh 

The first instruction puts the absolute address of the first table element (E206/t) 
in the X Index register. The effective address calculated in the following instruc¬ 
tion is B + X. If, say, B is 04 h on entry, then this gives 0004 + E206 = E20Ah. 
The data in here is 4C h, and this is the value loaded into Accumulator_B. Notice 
the assembler directive . BYTE, which states that the following bytes are to be put 
into memory verbatim; that is not to be interpreted as instruction mnemonics. 


Constant Offset from Program Counter 


op-code 


post-byte 


±n 


op-code 


post-byte 


±n 


± n , PC (8-bit) 


± n, PC (16-bit) 
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One of the major advantages of the Relative address mode is that it produces 
position independent code (PIC). Thus a Branch is relative to where the program 
is at the time the decision is taken. If the program is moved to a different part 
of memory, all the offsets move with it unchanged. This is what differentiates 
a Branch from a Jump operation. The Program Counter Offset mode extends 
the PIC capability to any instruction which has an Indexed address mode. This is 
similar to the Constant Offset from Register mode, but with the Program Counter 
being the Index register. For example in: 

LDA 200h,PC ;[A] <- [200+[PC]] 

the data 200 h bytes on from where the PC is on execution (pointing to the fol¬ 
lowing instruction) is placed in Accumulator_A. This of course is not an absolute 
address, as only the distance from the instruction is of interest. PIC is especially 
suitable for code in ROM (i.e. firmware) which can be placed anywhere in the ad¬ 
dress space. Thus a vendor could sell a ROM-based floating-point package with 
no a priori knowledge of where the customer will locate the firmware in memory. 

As an example of this, consider the 7-segment decoder routine previously dis¬ 
cussed. Line 1 of the actual code (shown second column from the left) contained 
the bytes E2-06/t, which is the absolute location of the table bottom. If, say, the 
table of data was to start at C1 80h, then the ROM would have to be reprogrammed 
to make these two bytes Cl -80h, the rest of the code remaining unaltered. Here 
is a PIC version of the same routine: 


C102/3/4 

30-8C-03 


LEAX 

3, PC 

;Effect!ve 

address PC+3 is 

loaded into 

X, which then points to the table 

C105/6 

E6-85 


LDB 

B,X ; Get element [B] 

C107 

39 


RTS 


; Table of 

7-segment codes 

begins here 



C108-C10C 

01-4F-12-06-4C 

TABLE_B0T: 

.BYTE 

1,4Fh,12h,6,4Ch 

C10D-C111 

24-20-0F-00-0C 


.BYTE 

24h,20h,OFh,0,OCh 


The only difference between the two programs is in line 1. In the first case, the 
absolute address of the table bottom is put into the X Index register. In the re¬ 
locatable case, the X Index register is loaded with the contents of the Program 
Counter+3, which is again the address of the bottom of the table, but is the dif¬ 
ference between the instruction following step 1 (i.e. at Cl 05h) and the base of 
the table. If the program is bodily moved somewhere else, the offset of three 
bytes to the table remains the same. Thus the address of the table is calculated 
during each run rather than before (at load time). 

As with Branch operations, assemblers save the programmer having to calcu¬ 
late this offset, by permitting the use of an absolute label in this type of address 
mode; thus assembling: 

LEAX TABLE_B0T,PC 

still produces the same code 30-8C-03 h\ that is the label TABLE_B0T is interpreted 
by the assembler as the distance from the following instruction to the absolute 
address TABLE_B0T and not the absolute value Cl 08h. 
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We first met the Load Effective Address (LEA) instruction in Table l2.2l Here 
we observed that it could be used to perform simple arithmetic on the X, Y, U or S 
registers. Essentially, any effective address computed by any of the Direct Index 
address modes, except Post-Increment/Pre-Decrement, can be loaded into one of 
these four registers. A few examples are: 

LEAX +2,X ; The EA of X+2 is put into X, effectively incrementing X by 2 
LEAY D,X ; Adds [D] to [X] and puts sum in Y 
LEAS -20,S ; Moves the Stack Pointer down 20 bytes 


2.3 Example Programs 

Previously we have used program fragments to illustrate various instruction/address 
mode combinations. Here we conclude our look at 6809 assembly-level software 
by developing three programs of a slightly more elaborate nature. This will serve 
to integrate at least some of the concepts we have discussed, and provide for a 
comparison with equivalent software using 68000 code in Chapter 4. Each pro¬ 
gram module is written in the form of a complete subroutine; that is data is 
assumed present on entry in some place, usually in a register, and is terminated 
by a Return from Subroutine (RTS) instruction. Subroutine structure is the 
subject of Chapter 5. 

Implementing a software function involves developing an appropriate algo¬ 
rithm, writing code in a suitable language, testing and debugging. There is little 
that can be done to mechanize the former, as algorithms are an expression of 
human creativity. Once this has been done, a range of software tools, such as 
assemblers, linkers, compilers and simulators, exist to aid in the production of 
the latter phases. We will look at these in some detail in Part 2. 

The most fundamental software tool is the assembler. An assembler is a pro¬ 
gram that translates, on a line for line basis, symbolically-coded native language 
to machine code for the target processor. This saves the error-prone tedium 
of working out op-codes and relative offsets. Nearly as important is the use of 
mnemonics for instructions and names for locations (labels). These, together 
with the use of comments, provide superior documentation compared to strings 
of hexadecimal digits (see page l!68l . 

At this point in the text, we are only concerned to provide sufficient back¬ 
ground to allow the reader to follow program syntax as presented in the re¬ 
mainder of the text. Assemblers, like any other commercial package (such as 
a word processor), have their own peculiar rules and peccadilloes, which have to 
be learnt. One common denominator is the virtually unanimous use of the pro¬ 
cessor manufacturers' standard instruction mnemonics, with minor variations. 
Most of the variations lie in the layout of the source code and the directives (or 
pseudo operators) used to pass information from the programmer to the assem¬ 
bler. 

A line of source code comprises four fields: an optional label, the essential 
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instruction mnemonic, the operand (if any) and an optional comment. Some as¬ 
semblers require all fields to be present in spirit, their absence being signalled 
by spaces or tabs. The Real Time Systems XA8 cross assembled used here has a 
free format, where absent fields can simply be omitted. The only essential role 
of space is in separating the instruction mnemonic from its operand. However, 
as the following code fragment shows, spaces and tabs should be used for read¬ 
ability: 

BCC NEXT;IF no Carry THEN don't add one to int X 
ADDD #1 

NEXT:RTS;and return 
or 

BCC NEXT ; IF no Carry THEN don't add one to int X 

ADDD #1 

NEXT: RTS ; and return 

The latter source code is obviously more pleasing to the eye. Notice that lines 1 
and 2 have no label, line 2 no comment and line 3 no operand field. 

Looking at the syntax in more detail. 

Labels 

These are defined in the first field and should be delineated by a colon. The colon 
is omitted when the label is referred to in the operand field. The label takes on 
the value of the Program Counter pointing to the first instruction byte. Labels 
can be up to 15 meaningful alphanumeric (including _ and .) characters long, 
and should not start with a numeral. 


Operator mnemonics 

These are the standard manufacturer's mnemonics, with a few minor extensions. 
There must be an entry in this field. 


Operand 

These may be a label, defined name, address or data constant. Numbers may be 
in decimal, hexadecimal, octal, binary or ASCII. Thus the following all translate 
to the same: 


LDA #43h 
LDA #67 

LDA #01000011b 
LDA #103o 
LDA #'C' 


Codes as 86-43h. 
Codes as 86-43h. 
Codes as 86-43h. 
Codes as 86-43h. 
Codes as 86-43h. 


Use a 0 prefix if MSD is alpha, e.g. 0F6h 

Decimal 67 is 43 hex 

Binary 01000011 is 43 hex 

Octal 103 is 43 hex 

ASCII 'C' is 43 hex 


1 Real Time Systems, M & G House, Head Road, Douglas, Isle of Man, British Isles; Intermetrics 
Microsystems Software Inc., 733 Concord Avenue, Cambridge MA 02138, USA.; Whitesmiths Aus¬ 
tralia Pty Ltd. PO Box 756, Suite 3, 47 Regent Street, Kogarah NSW 2217, Australia; COSMIC SARL, 
33 rue Le Corbusier, EUROPARC CRETEIL, 94035 CRETEIL CEDEX, France and ADaC, Nihon Seimei 
Otsuka Bldg., No. 13-4 Kita Otsuka 1-chome, Toshima-Ku, Tokyo 170 Japan. 
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but the use of the appropriate form aids in readability and thus documentation. 

Mathematical expressions can be used to generate a constant at assembly time, 
thus: 


LDA MSD-1 

LDA ARRAY+(i*5)+j 

BRA .+3 


Get data from address MSD less one 
Get data from address ARRAY plus 
i rows of 5 and j columns 
Branch forward 3 places 


Comment 

The final field is simply a documentation comment, delimited by a semicolon ;. 
Whole-line comments are possible with an initial semicolon. Some assemblers 
use an asterisk * to delimit comments. 

Some of the more common assembler directives, all of which are distinguished 
by a leading period, are: 

.PROCESSOR 

The first line of source code must indicate which processor is being targeted, e.g.: 
.processor m6809 
for the 6809 MPU. 


.END 

The last line of source code must be . end. 


.DEFINE 

This gives a permanent value to a symbol. For example: 

.DEFINE ERROR = OFFh, 

TRUE = 01, 

FALSE = 0, 

PIA_BASE = 8080h 


CM PA 

#ERR0R 

BEQ 

ABORT 

CM PA 

#FALSE 

BEQ 

REPEAT 

CM PA 

#TRUE 

BNE 

ABORT 

LDB 

PIA_BASE+2 


This mechanism is useful in assigning names to absolute locations, such as 
those associated with hardware interface ports, and to constants which have a 
readily identifiable meaning. Placing definitions at the start of the source pro¬ 
gram means that such constant data and addresses can be altered throughout 
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the source file by simply altering this header. The mnemonic EQU (EQUate) is 
frequently used in other assemblers to perform the same function; see page 11801 

.INCLUDE 

Source code in separate files can be included for assembly by using this directive, 
for example: 

.INCLUDE "stdio.h" ; Insert the I/O header file at this point 


.PSECT 

A useful feature of this assembler is the ability to delineate sections of the source 
program to produce code in different memory areas. Thus program code and 
fixed constants can be assigned to area _text which the linker can place in mem¬ 
ory occupied by ROM, whilst section _data can be used for variable data destined 
for RAM. An example of the use of . psect is given in Table 12.121 

.ORG 

The assembler used here is configured to be relocatable, that is absolute ad¬ 
dresses are not assigned until link time (see Section 7.2). The .ORG function 
is normally used in an absolute assembler (one in which absolute locations are 
assigned at assembly time) to denote where the code commences. In the RTS as¬ 
sembler . ORG can be used in a relocatable manner relative to a label, for example: 

.PSECT _text ; Program code 

START: LDA MEMORY ; Program start (e.g. OEOOOh) 


RVECTOR: .ORG START+lFFEh ; Move on from start (e.g. OFFFEh) 

.WORD START ; Put in Reset vector 

. end 

Assuming that the section _text is linked to OEOOOh, then the code at RVECTOR 
is commanded to be placed in OEOOOh + 1 FFEh = OFFFEh. 

.BYTE, .WORD, .DOUBLE, .TEXT 

In the code fragment above, the assembler is commanded to place the double¬ 
byte constant EO-OOh in at RVECTOR: RVECT0R+1, using the .WORD directive. The 
directives . BYTE and .DOUBLE are similar, but allocate storage of 8 and 32 bits 
respectively. . TEXT allows series of bytes, entered as strings within quotes, to be 
stored in a si mil ar manner. Other assemblers use FCB (Form Constant Byte), 
RMB (Reserve Memory Byte), FDB (Form Double Byte), FCC (Form Constant 
Character) as equivalent directives. 

We have already seen an example of . BYTE when we designed the 7-segment 
decoder subroutine on page [39) A simple example of .TEXT is: 


.TEXT "This is an example", 0 
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which is considerably more convenient than the equivalent: 

.BYTE 54h,68h,69h,73h,20h,69h,73h,20h,61h,6Eh 
.BYTE 20h,6 5 h,78h,61h,60h,70h,6Ch,65h,0 

Statements such as this have to be used with caution where the program is 
blasted into ROM. Constants can be located in ROM (e.g. . psect _text). but not 
in RAM (e.g. . psect _data). This is because there is no download of code prior 
to the run, and volatile memory is unpredictable on power up. Care must be taken 
when using a simulator to debug such programs, as this data is downloaded into 
RAM from the assembled machine code file and will then appear to be available 
at start-up. 


Our first program generates the sum of all integers n up to a maximum of 
255 (FF h). We assume that n is passed to the subroutine in Accumulator_B. The 
ma xi mum possible total of 32,640 can comfortably fit into the 16-bit Accumula- 
tor_D for return. 


Table 2.9 Source code for sum ofn integers program. 
.processor m6809 


* FUNCTION : Sums all unsigned byte numbers up to n 

* ENTRY : n is passed in Accumulator B 

* EXIT : Sum is returned in D Accumulator 

* EXIT : Index X = sum 


.psect 

_text 

Direct code into text area 

; for (sum=0;n>0;n--){ 


1 dx 

#0 

Sum = 0000 

SLOOP: tstb 


n > 00? 

beq 

SEND 

IF not THEN end 

abx 


ELSE sum = sum + n 

decb 


n-- 

bra 

SLOOP 

> 

SEND: tfr 

x, d 

Put sum in D Accumulator as asked 

S EXIT: rts 


for return 

. end 




The algorithm used in Table 12.91 simply clears the initial total, temporarily 
located in the X Index register, and adds to it the progressively decrementing in¬ 
teger, kept in Accumulator_B. When B reaches zero, the grand total is transferred 
to Accumulator_D for return. The instruction Add B to X (ABX) is a convenient 
vehicle to add the 8-bit integer to the 16-bit partial summation. Without it, n 
would have to be unsigned promoted to 16-bits by zeroing Accumulator_A and 
then the instruction LEAX D ,X used for the addition. 

The source-code file is translated by the assembler program to produce a 
machine-code file, which will eventually find its way into program memory. An 
absolute listing file is also generated, which documents the machine code and its 
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location together with the original source code. The listing of Table (ZTOl shows 
the outcome of the translation, with the line number, location and machine code 
occupying the leftmost three columns. This type of file is often referred to as 
object code. The absolute location of the machine code is decided by the linker- 
locator program, as described in Section 7.2. All 6809-based programs in this text 
assume ROM from EOOO/i upwards for the program sections designated _text, 
and RAM from OOOOh upwards for the _data sections. Only _text is needed in 
this case. 

Table 2.10 Object code generated from Tahlp i2.b\ 

1 .processor m6809 


2 

B 

* 

FUNCTION 

Sums all unsigned byte numbers up to n 


4 

* 

ENTRY 

n is passed in Accumulator B 


5 

* 

EXIT 

Sum is returned in D Accumulator 


6 

7 

8 

* 

EXIT 

Index X = sum 



9 




psect 

_text 

Direct code 

into text area 

10 

; for (sum= 

=0; n>0; n — ) { 





11 

E000 

8E0000 

SUM_0F_N: 

1 dx 

#0 

Sum = 0000 


12 

E00B 

5D 

SLOOP: 

tstb 


n > 00? 


13 

E004 

2704 


beq 

SEND 

IF not THEN 

end 

14 

E006 

3A 


abx 


ELSE sum = 

sum+n 

15 

E007 

5A 


decb 


n-- 


16 

E008 

20F9 


bra 

SLOOP 

} 


17 

E00A 

1F10 

SEND: 

tfr 

x,d 

Put sum in 

3 Accumulator as asked 

18 

E00C 

39 

S_EXIT: 

rts 


for return 


19 




end 





The program of Table [ZTD1 is 12 bytes long and takes 16 + 13n cycles (max¬ 
imum 3331). An alternative algorithm recognizes that the total is given by the 
expression n x (n + 1) 2. In Table ETU this is implemented by copying n into 

Accumulator_A, incrementing it, multiplying the two Accumulators and doing a 
single double-byte shift right (i.e. -p 2). Only six bytes long and executing in a 
fixed 28 cycles, this illustrates that time taken in refining the problem algorithm 
can be profitable. However, there is a bug in this implementation, with one value 
of n giving an erroneous zero answer. Can you determine which, and recode to 
avoid this problem? 


Our second program is more elaborate. We are required to convert a 16-bit 
binary word to a string of ASCII-coded decimal digits, ter mi nated with 00 h (ASCII 
NULL). The more usual mathematical conversion algorithm requires that the base- 
M number be continually divided by ten, the series of remainder digits being the 
base-10 equivalent (see the listing of Table 14.141 . Implementing this requires 
a lengthy division/remainder subroutine. If this is already present for use by 
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Table 2.11 A superior implementation. 
1 .processor m6809 


2 

3 

V 

FUNCTION 

: Sums all unsigned byte numbers up to n 

4 

V 

ENTRY 

: n is passed in Accumulator B 

5 

v 

EXIT 

: Sum is returned in D Accumulator 

6 

7 

f 

EXIT 

: No other registers disturbed 


8 .psect _text ; Direct code into text area 

9 ; 


10 

; sum = rr 

'(n+l)/2 



11 

E000 

1F98 

SUM_0F_N: 

tfr b,a 

Copy n into Acc.A 

12 

E002 

4C 


i nca 

which becomes n+1 

13 

E003 

3D 


mul 

n*(n+l) now in Acc.D 

14 

E004 

44 


1 sra 

Divide by two 

15 

E005 

56 


rorb 

by shifting right once 

16 

E006 

39 

S_EXIT: 

rts 

for return 

17 




. end 



another program module, the resulting code will be acceptably short. In any 
case, in the absence of a hardware divide operation in the 6809, execution time 
is likely to be long. 

An alternative algorithm, which is especially suitable for small numbers, is 
illustrated in Fig. 12.41 Essentially the nth-decade digit is evaluated as the number 
of successful subtractions by 10", where n begins at the highest possible value, 
and is decremented towards zero after each decade evaluation. As the maximum 
value for a 16-bit binary number is 65,535, this requires subtraction by 10,000, 
1000, 100, 10 and 1. With the procedure being the same for each decade, it is 
easier to store the constants as a table in ROM and use a loop with an advancing 
pointer to select the decade and its corresponding table entry. This look-up table 
is shown in the listing of Table [2TT21 in line 43. Notice the additional zero word 
at the end of the table; this is used to provide an escape mechanism after the 
decade passes 10°. 

The actual subtraction of 10" is performed in line 23, with the X Index regis¬ 
ter pointing into the table of powers. If no borrow is generated (C = 0), the byte 
holding the nth string character (initialized to ASCII 0 = 30 h in lines 18-21) 
is incremented and the process repeated (lines 25-28). On emerging from this 
inner (decade) loop, the 10" constant is added back to compensate for the one 
subtraction too many. As line 30 uses the Double-Increment Index address mode 
(ADDD , X++), the table pointer is simultaneously advanced one word. LEAY 1,Y 
then increments the string pointer (the Y Index register) one byte, and the scene 
is set for the next decade evaluation. Before returning to the top of this outer 
loop, the escape condition (i.e. NULL) must be tested. There is no instruction 
to test the zero state of a double memory location; instead an unused double 
register is loaded with the word data (LDU 0,X in line 34) and the Z flag will 
be set accordingly. An alternative escape procedure would be to decrement a 
count on each loop pass or simply to check the table pointer for 0E030h (e.g. 
CMPX #PWR_10+10). Using a special terminate character is better where the length 
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Table 2.12 Object code for the conversion of 16-bit binary to an equivalent ASCII-coded decimal 
string. 

1 .processor m6809 

2 ■ ******* * * * * * * ************** * * * * * * * ****************** * * * * * * * * * * * * * 

3 ; * Converts 16-bit binary to a string of five ASCII-coded * 

4 ; * characters terminated by 00 (NULL) * 

5 ; * EXAMPLE : FFFF -> ' 6 " 5 " 5 " 3 " 5 "0' (36/35/35/33/35/OOh) * 

6 ; * ENTRY : Binary word in D * 

7 ; * EXIT : Decimal string in 6 RAM bytes starting from DEC_STRG* 

8 ; * EXIT : All register contents unchanged * 

Q ■ ****** * * * * * * * * * * ************** * * * * * * * * *************** * * * * * * * * * * * * 


11 .define NUL = 

12 .psect _text 


13 

E000 3476 

BIN_2_DEC: 

pshs 

a,b,x,y,u 

14 

; N=4 




15 

E002 308C21 


leax 

PWR_10,pc 

16 

E005 108E0000 


ldy 

#DEC_STRG 

17 

; Nth decade 

= 'O' 



18 

E009 1F03 

NEW_N: 

tfr 

d, u 

19 

E00B 8630 


Ida 

#'0' 

20 

E00D A7A4 


sta 

o,y 

21 

E00F 1F30 


tfr 

u, d 

22 

; Binary - 10 

**N 



23 

E011 A384 

NEXT_SUBT: 

subd 

0, x 

24 

; Can do? 




25 

E013 2504 


bcs 

NEXT_DEC 

26 

; IF Yes THEN 

increment 

Nth decade 

27 

E015 6CA4 


i nc 

o,y 

28 

E017 20F8 


bra 

NEXT_SUBT 

29 

; ELSE restore 10**N to 

binary 

30 

E019 E381 

NEXT_DEC: 

addd 

, X++ 

31 

; N = N -1 




32 

E01B 3121 


leay 

i.y 

33 

; N < 0? 




34 

E01D EE84 


ldu 

0, x 

35 

; No 




36 

E01F 26E8 


bne 

NEW_N 

37 

; Yes 




38 

E021 6FA4 


cl r 

o,y 

39 

; End 




40 

E023 3576 


puls 

a,b,x,y, u 

41 

E025 39 


rts 



. * * * * * * * * * * * * * ************** * * * * * * • 

42 ; This is the table of powers of 10 

43 E026 2710 PWR_10: .word 10000 

03E8 
0064 
000A 
0001 
0000 

■ * * * * * * * * * * ************** * * * * * * * * * * * * * * * ' 

44 ; This is the area of RAM where the numbe 

45 .psect _data 

46 0000 DEC_STRG: .byte [6] ; 

47 .end 


0000 

; Save pointer registers used 

; Point to table bottom (10A4) 

; Point to beginning of string in RAM 

; Put away binary for safekeeping 
; Put ASCII 'O' in nth decade of string 

; Get binary back 

; A Carry/borrow means No 


; Advance one decade 
; Look for double-byte NULL in table 

; IF Yes terminate the string 

y,u ; Return old register values 

; Omit if above is puls a,b,x,y,u,pc! 

* ****************** * * * * * * * * * * * 

1000,100,10,1,NUL 

************ * * * * * * * * * * * * 

string is to be returned 

Reserve six memory bytes for string 


of the table can vary, and is the normal approach to character strings, as is spec¬ 
ified in this example (line 38). 

None of the MPU's registers are altered by this subroutine, except the Code 
Condition register. A subroutine with this property is known as transparent. This 
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is achieved by pushing the used registers onto the System stack at the beginning 
(line 13) and restoring them at the end (line 40). In general the number of Push 
and Pull operations should match to ensure that the System Stack Pointer is back 
up to the return Program Counter, which was shoved out automatically when the 
subroutine was called. Thus Return from Subroutine (RTS) will then be able 
to retrieve the original PC as required. One trick sometimes seen is to add the PC 
to the last PULS, which of course does the same thing; thus: 

PULS A,B,X,Y,U,PC 

is the same as 

PULS A,B,X,Y,U 
RTS 

The two pointers, X to the table and Y to the string, are set up just after 
the initial Push. The table pointer is set up in line 15 using the Program Counter 
Relative address mode, LEAX PWR_10, PC. Looking at the machine code produced 
(namely 30-8C-21 h), shows an operand of 21 h, being the distance between the 
PC (pointing at execution time to the following instruction at 0E005h) and the 
start of the table at 0E026/1. This relative operand ensures that no matter where 
the program/table ROM is placed in address space, the code need not be altered. 
This code is strictly speaking not position independent, as the string is in a fixed 
location in the _data program section, that is in RAM. If DEC_STRG is the first 
occurrence of .psect _data, then our linker will place the string at locations 
0000 h to 0005h. Thus the code in line 16 for LDY #DEC_STRG is 108E-0000h. 
We could use the Program Counter Relative mode here (i.e. LEAY DEC_STRG, PC) 
but this would mean that the address distance between the ROM and RAM chips 
would have to remain constant, and they could not be independently relocated: 
not very convenient. 


Our last example also has a mathematical flavor. We are required to calculate 
the factorial of an integer n passed in Accumulator_B. The factorial of n (repre¬ 
sented as n!) is defined as n x (n - 1) x (n - 2) x ■ ■ ■ x 3 x 2 x 1. By convention 
0! is defined as 1 0. 

Superficially this appears to be the same as our first example, but with multi¬ 
plication replacing addition, see Fig. |2.5l However, the product rapidly becomes 
very large, with 12! = 479,001,600 being the largest factorial fitting into a 32-bit 
binary number. Thus we will restrict n to the range 0-12, and will have to return 
nl in four memory bytes, as no 6809 register of this size is available (although it 
could be returned in two pieces using, for example, the X and Y Index registers). 
Furthermore, we will use Accumulator_B to return an error status byte of FFh if 
the programmer sent an out of range integer (n > 12), otherwise 00 h indicating 
success. 

Our first problem is that product generation is a 4-byte long-word process, 
whilst the 6809 can only perform an 8 x 8 multiplication. Thus our requirement 
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for an 8 x 32 product will have to be met by four 8x8 operations. Hence we will 
require four memory bytes to hold the product (after each total multiplication) 
and at least four memory bytes to act as a temporary workspace, where the four 
multiplications will be summed as they happen. 

The initial value of the product is set to 0001 h in lines 21-25 of the listing 
in Table [27i~3l and the 7-byte temporary workspace cleared in lines 30-33. The 
actual 4-stage multiplication of the partial product to the integer byte n takes 
place in the following lines 35-48. This is shown diagrammatically at the right 
of Fig. 12.61 from where it can be seen that each process is similar, but with the 
addition shifted left once each move towards the MSB of PROD. Thus the word 
n x [PR0D+3] is added to TEMP+5 :TEMP+6, with any carries up to TEMP+3 (no 
more, as we know the result will never exceed four bytes). The second product of 
n x [PR0D+2] is added to TEMP+4 :TEMP+5, with any carry to TEMP+3. The word 
n x [PR0D+1] is summed to TEMP+3 :TEMP+4, whilst the same 4-byte restriction 
means that only the lower-byte of the final product n x [PROD] (i.e. [B]) need be 
added to the temporary store. 


PROD PROD+3 




Overflow 
_/\_ 


TEMP 


X 


TEMP+6 



Third multiplication 

and shifted—twice addition 



Fourth multiplication 
and shifted—thrice addition 


Figure 2.6 A memory map of the factorial process. 

From the above discussion, we see that the addition process is different at 
each position, as the 16-bit result from the multiplication slides' from right 
to left. This is a pity, as otherwise the four multiply/add steps are the same. 
This inefficiency can be circumvented by allowing three buffer temporary bytes, 
as shown dashed at the top of Fig. 12.61 This allows us to put the multiply/add 
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Table 2.13 Fundamental factorial-n code. 

1 .processor m6809 

2 ■ ****** * * * * * * * ***** * ******** * * * * * * * * * * * * * * * * * ************** * * * * * * * * 

3 ; * Subroutine calculates the factorial of n (n!) * 


4 


EXAMPLE : n = 12; n! = 

479,001,600 * 

5 


ENTRY : n in Acc.B; maximum value 12 * 

6 


EXIT : n! in 

4 bytes PROD -> PROD+3 * 

7 


EXIT : Acc.B 

= -1 (FFh) if error (n>12) ELSE 00 * 

8 

. *********** ******* 

************************************************ 

9 



.define ERROR = 

-1 

10 

: Initialize 




11 



.psect 

_text 


12 

E000 

3412 FACTORIAL: 

pshs 

a,x 

Save these registers 

13 

E002 

3404 

pshs 

b 

Put n away for safekeeping 

14 

; Error condition 




15 

E004 

C10C 

cmpb 

#12 

IF >12 THEN an error condition 

16 

E006 

2306 

bis 

CONTINUE 

ELSE continue 

17 

E008 

C6FF 

ldb 

#ERROR 

Put FFh in B to signal error 

18 

E00A 

E7E4 

stb 

0, s 

and into where it is in the stack 

19 

20 

E00C 

204B 

bra 

FEXIT 

and exit with it 

21 

E00E 

7F0003 CONTINUE: 

cl r 

PROD+3 

Initialize product to OOOlh 

22 

E011 

7C0003 

i nc 

PROD+3 


23 

E014 

7F0002 

cl r 

PROD+2 


24 

E017 

7F0001 

cl r 

PROD+1 


25 

E01A 

7F0000 

cl r 

PROD 


26 

; N <=1? 




27 

E01D 

E6E4 OUTER_LOOP 

: ldb 

0, s 

Get factor n (or residue) back 

28 

E01F 

C101 

cmpb 

#1 


29 

E021 

2336 

bis 

FEXIT 

IF <=1 then answer is in PROD 

30 

E023 

8E0004 

ldx 

#TEMP 

Now clear temporary product area 

31 

E026 

6F80 CLOOP: 

cl r 

, x+ 

all five 7 bytes 

32 

E028 

8C000B 

cmpx 

#TEMP+7 


33 

E02B 

26F9 

bne 

CLOOP 


34 

; Now begin the multiple multiplication (PROD = PROD*n) 

35 

E02D 

8E0004 

ldx 

#PR0D+4 

Point to just past LSB product 

36 

E030 

E6E4 MUL LOOP: 

ldb 

0, s 

Get residue of n from stack 

37 

E032 

A682 

Ida 

> -x 

and ith byte of product, i++ 

38 

E034 

3D 

mul 


[D] holds the product 

39 

E035 

E306 

addd 

6,x 

Add it to temporary product 

40 

E037 

ED06 

std 

6,x 

6,x points into temp product area 

41 

E039 

A605 

Ida 

5 , x 

Now add any carry to the third byte 

42 

E03B 

8900 

adca 

#0 


43 

E03D 

A705 

sta 

5 , x 


44 

E03F 

A604 

Ida 

4, x 

and to the next higher byte 

45 

E041 

8900 

adca 

#0 


46 

E043 

A704 

sta 

4, x 


47 

E045 

8CFFFF 

cmpx 

#PR0D-1 

i over the MSB product 

48 

A Q 

E048 

26E6 

bne 

MUL_L00P 

IF not then again once left 

50 

E04A 

3001 

1 eax 

1, x 

Increment pointer to MSD product 

51 

E04C 

A607 MOVE LOOP: 

Ida 

7, x 

Moving temporary product bytes 

52 

E04E 

A780 

sta 

, x+ 

which is the new product 

53 

E050 

8C0004 

cmpx 

#PR0D+4 

to its rightful place 

54 

E053 

26F7 

bne 

MOVE_LOOP 


55 

; n=n-l 




56 

E055 

6AE4 

dec 

0, s 


57 

E057 

20C4 

bra 

OUTER LOOP 


58 

E059 

6FE4 FEXIT: 

cl r 

0, s 

Zero (no error), to B in stack 

59 

; End 




60 

E05B 

3504 ERR EXIT: 

pul s 

b 

Gets error condition from stack 

61 

E05D 

3512 

pul s 

a,x 

Retrieve used registers 

62 

63 

E05F 

39 

rts 


n! is in the four PROD locations 

64 

’ 


.psect 

data 

Define the data area 

65 

0000 

PROD: 

. byte 

[4] 

The area holding the product 

66 

0004 

TEMP: 

. byte 

[7] 

The temporary product area 

67 



. end 
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process in a loop ensuring that no other data object is inadvertently altered by the 
slide leftward. In lines 36-48 of this loop, the X Index register is used to point to 
both the relevant product byte (line 37) and, with offset, to the temporary addition 
target bytes (lines 39-46). When the multiplication is over, the result becomes 


Table 2.14 Factorial using a look-up table. 

.processor m6809 

* * * * * * * * * * * * * * * * ********* * * * * * * * * * * * * * *************** * * * * * * * * * * * * * * 

* Subroutine calculates the factorial of n (n!) * 


EXAMPLE 

ENTRY 


n = 12; n! = 479,001,600 
n in Acc.B; maximum value 12 


6 

| 

EXIT 

: n ! i n 4 

bytes 

PROD -> PROD+3 * 

7 

; 

EXIT 

: Acc.B = 

-1 (FFh) if error (n>12) ELSE 00 * 

8 

. ********* 

V * * * * * * * * ****** * * * * * * * * * * * * * * ******************** * * * * * * * * * 

9 




.list 

+ . text 


10 




.define ERROR = 

-1 

11 

j 

Initialize 




12 




.psect 

_text 


13 

E000 

3430 

FACTORIAL : 

pshs 

x,y 

Save registers 

14 


Error condition 




15 

E002 

C10C 


cmpb 

#12 

IF >12 THEN an error condition 

16 

E004 

2304 


bis 

CONTINUE 

ELSE continue 

17 

E006 

C6FF 


ldb 

#ERR0R 

Put FFh in B to signal error 

18 

E008 

2016 


bra 

FEXIT 

and exit with it 

19 


Get factorial out of table 


20 

E00A 

58 

CONTINUE: 

1 si b 


Multiply n by four 

21 

E00B 

58 


1 si b 


as table is 4-wide 

22 

E00C 

8EE023 


ldx 

#TABLE 

Point to bottom of table 

23 

E00F 

3085 


leax 

b,x 

Point to relevant table entry 

24 

E011 

10AE81 


ldy 

, x++ 

Get top two bytes, and advance pointer 

25 

E014 

10BF0000 


sty 

PROD 

and put away 

26 

E018 

10AE84 


ldy 

0, x 

Get lower two bytes 

27 

E01B 

10BF0002 


sty 

PROD+2 

and put these away 

28 

E01F 

5F 


cl rb 


Signal no error state 

29 

E020 

3530 

FEXIT: 

puls 

x,y 

Retrieve used registers 

30 

E022 

39 


rts 


n! is in the four PROD locations 

31 

J 






32 

J 

Now the 

table which 

is in 

the text (ROM) area 

33 

E023 

00000001 

TABLE: 

.double 1,1,2,6,24,120,720,5040 


34 E043 


00000001 

00000002 

00000006 

00000018 

00000078 

000002D0 

000013B0 

0000 

9D80 

0005 

8980 

0037 

5F00 

0261 

1500 

1C8C 

FC00 


.word 0,9d80h,5,8980h,37h,5f00h,261h,1500h,lc8ch,OfcOOh 


35 ; 

36 

37 0000 

38 


.psect _data 
PROD: .byte [4] 

. end 


Define the data area 

The area holding the product 
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the new product (lines 51 - 56). n is decremented in situ on the System stack, 
using the System Stack Pointer as an Index register (line 56), and the process 
continued until n = 1 (line 28). On exit Accumulator_B is cleared to indicate 
success (line 58), unless n is >12 on entry, in which case FF h is put into B (line 17), 
and an i mm ediate exit made. Notice how four bytes for the product and seven 
temporary locations are reserved in the data program section (RAM) in lines 65 
and 66. 

As there are only 13 legitimate outcomes of the program for n = 0 — 12, a 
more efficient technique is to use a look-up table. The coding for this approach 
is shown in Table [ZTT] Basically, the X Index register is pointed to the bottom of 
TABLE (line 22) and n (stored in Accumulator_B) is used as an offset to point into 
the relevant area. As each table entry occupies four bytes, B must be multiplied 
by four (by shifting twice left in lines 20 and 21), so that it goes up in 4-byte steps. 
The operation Load Effective Address into X with the address mode B, X points 
X to the entry in line 23 (the maximum value of B is 48, thus its sign extension 
inherent with this address mode will have no deleterious effect). Now the high 
word can be moved from the table to 2 bytes of memory via Index register_Y 
(lines 24 and 25). As the Indexed with Post Double Increment address mode 
is used, X will automatically point to the lower word, for a repeat performance 
(lines 26 and 27). 

The coding shows the assembler directive .DOUBLE being used for the first 
eight table entries and .WORD twice for each of the remaining entries. This is 
deliberate, as the assembler used here has a bug which gives incorrect values 
for .DOUBLE above 32,767 (00007FFFh). Assemblers, as all other software, are 
not immune to bugs! See Table [47141 for a look-up table using .DOUBLE for this 
situation. 

It is interesting to compare the performance of the two implementations. The 
former mathematical algorithm requires 96 bytes of ROM and 11 of RAM. Its 
operation time varies with n, from 53 cycles with n = 0 or 1 to 1724 cycles with 
n = 12. The tabular approach takes 84 bytes of ROM and 4 of RAM, and takes a 
fixed 42 cycles for n between 0 and 12. In both cases an error situation requires 
30 cycles. The conclusion is obvious. 
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CHAPTER 3 


The 68000/8 Microprocessor : Its 
Hardware 


At its inception, the microprocessor was perceived as a replacement for many 
applications then implemented by standard logic circuitry. The considerable en¬ 
hancement of facilities offered by second generation MPUs led to their use as the 
engine of a number of simple general purpose computers, such as the APPLE II. 
Whilst these were initially targeted at the home and education markets, the evolu¬ 
tion of affordable magnetic disk technology quickly created an explosive growth 
in their use in the business and scientific communities. 

The large potential market thus opened up was the impetus in the develop¬ 
ment of a new generation of more powerful MPUs. Although, as we have seen, 
there was some movement in that direction by 8-bit devices, in the main the 
opportunity was taken to expand the internal architecture to use 16 and 3 2-bit 
registers and ALUs. As well as increasing the data throughput, especially where 
floating-point computation is being used, this makes it easier to support larger 
external buses. Along with enhanced power, a larger memory space and support 
for data structures targeted to high-level languages was provided. Typically an 
increase in execution speed of around ten was achieved by this strategy. 

The Intel 8086/8088 16-bit MPU released in 1978 was designed to be com¬ 
patible with the older 8-bit 8080/8085 MPUs, and as such perpetuated many of 
their limitations. Internal registers were dedicated, rather than general purpose, 
and the address range of 1 Mbyte was fractured into 64 Kbyte segments. Later 
members of the family increased the register sizes and address capacity, with the 
32-bit 80386/486 being able to address 2 32 bytes. This family was popularized 
by their use in the IBM series of personal computers. 

First devices from the Motorola 68000 family were released in 1979 E1EDD. 
In contrast these took the chance of breaking completely with the past gener¬ 
ation. The 68000 MPU offered a 32-bit register structure from the beginning, 
although the 16-bit data bus and ALU really marks it as 16-bit with 3 2-bit preten¬ 
sions. A non-multiplexed address bus with effectively 24 lines gives a 16 Mbyte 
directly addressable memory capacity. This was later extended to 32 lines in 
the 68020/30/40 devices, giving a potential 4 Gbyte memory size. All the 8086 
family as well as 68020 up, provide the capability to easily ride tandem with a 
floating-point hardware co-processor; which considerably extends their capabili¬ 
ties in mathematics intensive computing, such as computer-aided design graph- 
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ics. In general the 68000 family is found in the more powerful personal comput¬ 
ers, such as the APPLE Macintosh, as well as graphic workstations such as the 
Hewlett Packard Apollo DN series. 

All this growth in raw power has made the microcomputer at least as powerful 
as a minicomputer from the last decade, but there has also been a spin-off into 
the area of embedded microprocessor circuitry, with which we are concerned in 
this text. Although the current 8-bit microprocessors are adequate in the ma¬ 
jority of embedded applications, either singly on in multiple-processor config¬ 
urations, many of the more powerful tasks are being implemented using these 
newer devices. This is not necessarily due to their virtues, but because more aids 
to hardware and software design, which have appeared in the last decade, have 
been targeted in this direction. This is especially true in the field of compiler and 
simulator work. 

The 68000/8 is the second of our MPUs we have chosen to illustrate high-level 
language techniques. This and the following chapter overviews its hardware and 
software features. 


3.1 Inside the 68000/8 

Here we look at the register model of the 68000 and 68008 MPUs. A highly 
simplified equivalent circuit of the former is shown in Fig. 13.11 Following the 
classification developed in Section 1.1, we will discuss the internal attributes of 
the device in terms of the mill, the register array and the control unit. 

THE MILL 

A 16-bit ALU implements in hardware the arithmetic operations of Addition, Sub¬ 
traction, Multiplication and Division; the former with and without carry/borrow 
and the latter in signed and unsigned representations. The logic operations of 
AND, OR, Exclusive-OR, NOT and Shift are also provided. 

Five flags in the associated Code Condition register (CCR) provide a status 
report on ALU activity. The Carry, Negative, Zero and 2's complement overflow 
semaphores are standard, but the extend flag needs some explanation. The X flag 
is similar to Carry, but is only affected by Addition, Subtraction, Negate and cer¬ 
tain Shift operations. Multiple-precision versions of these instructions use the 
X flag for their carry; thus the fa mi liar ADd with Carry (ADC) instruction ap¬ 
pears here as ADD with extend (ADDX). For example, this means that a Compare 
operation, which of course affects the C flag, can be done in between multiple- 
precision operations without affecting the ' true' carry information (which is in 
X). 

We shall see that the 68000 MPU directly operates on byte (8-bit), word (16-bit) 
or long-word (32-bit) data. All CCR flags operate correctly (eg. Carry from bit 7, 
bit 15 or bit 31 respectively), automatically reflecting the operand size. 

As shown, the Code Condition register occupies the lower byte of the 16- 
bit Status register; the upper field containing masks and bits which control the 


58 C FOR THE MICROPROCESSOR ENGINEER 


Address Bus 

^CT'CONvDin^rroai«o n ._ l . 


Data Bus 

CSS^^Stncor-.^DinM-ncu^o 

? : S : S5 : S : 3‘S'S'S'S15‘S7315?'S 



Figure 3.1 Internal structure pfthe 68000. 







Status register 

Control register Code Condition register 


INSIDE THE 68000/8 59 


T 


S 



12 

h 

lo 




X 

N 

Z 

V 

c 


: : Interrupt 

: : Mask 

: : Priority 

: : Level 

: Supervisor/User state 

Trace on/off 


: : : : Carry/Borrow 

: : : overflow (2's complement) 

: : Zero outcome 

: Negative (MSB=1) 
extend carry 


operating mode of the processor. The three bits 12 II 10 represent the Interrupt 
mask. The MPU will only respond to an interrupt request signalled externally 
on pins IPL2 IPL1 IPLO, (IPL stands for Interrupt Priority Level) if this active-low 
IPL number is above the mask number. For example, an IPL number Low High 
Low (active-low 5) will trigger a level-5 request (IRQ5 in Fig. 13. ID . if the mask is 
set at between 000 and 1 00. The exception is a level-7 request, which is non¬ 
maskable. More details are given in Section 6.1. The mask is set to level 7 at 
Reset, thus inhibiting all but a non-maskable interrupt. 

The 68000 MPU leads a Jeckyll and Hyde existence, in that it has two states 
of existence, which are virtually independent of each other. These are somewhat 
more prosaically termed the Supervisor and User states. When the MPU is Reset, 
the S bit in the Status register is automatically set to 1, the Supervisor state. 
Certain, so called privileged instructions, can only be executed in this state. 
These instructions generally deal with the overall operation of the processor. For 
example it is only possible to change the Interrupt mask in the supervisor state; 
for example: 


MOVE #00100 


100 00000000b,SR 


sets the mask to level 4. Moving data into the Status register is a privileged 
instruction (but not reading it). 

The only way to exit the Supervisor state is to clear the S bit, for example: 


ANDI #ll[o]lllllllllllllb ,SR 

will clear bit 13 of the Status register and leave all else unchanged. As you might 
expect AND Immediately to Status Register is privileged, as is 0RI #data, SR 
(to set individual bits) and E0RI #data, SR (to toggle individual bits). 

Once in the User state you cannot return to the Supervisor state by simply 
setting the S bit as the MOVE, ANDI, ORI and E0RI #data,SR instructions are 
illegal in this situation, (but note that the same instructions targeted to the CCR 
part of the Status register are perfectly legal; for instance: 


ORI #00000001b,CCR 
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sets the Carry flag. The only way back to the Supervisor state is when an inter¬ 
rupt or Trap occurs (a Trap is a type of Software interrupt, and is described in 
Chapter 6). 

What is the point in having two distinct states? In a multitasking environment 
(more than one program running concurrently on the same machine) it is usual 
to have a master program, known as the operating system. The operating system 
provides resources to the user program, such as an interface to a magnetic disk 
store. Where more than one user program appears to run simultaneously, it may 
switch between these programs in a time-slice manner in a fairly complex way (4). 
As a simple example, consider a microprocessor development system to which 
software can be downloaded into RAM, whence it can be run and tested. The 
operating system, here called a monitor, usually resides in ROM. Once control is 
passed from the monitor to the user program running in real time, the only way 
back to the operating system is via a Software interrupt, Hardware interrupt or 
Reset. In all these situations it is important to ensure that user programs do not 
corrupt memory or other resources used by the operating system. 

In the 68000 MPU, this operating system runs in the Supervisor state, into 
which it enters automatically on Reset. The MPU informs the outside world which 
mode it is running by using the three Function Code pins FC2 FC1 FCO, as detailed 
in Section 3.2. Thus the hardware engineer can design the address decoder to 
access Supervisor ROM and RAM chips in an entirely separate address space than 
that accessible to the User program. Furthermore, the Supervisor and User modes 
have separate System Stack Pointers, the Supervisor Stack Pointer (SSP) and User 
Stack Pointer (USP). Thus, in reality there are two A7 registers, only one of which 
is active in any mode. Both separate and mutually exclusive memory spaces 
and System Stack Pointers make it difficult for the user program to accidentally 
corrupt the operating software. 

In small dedicated embedded systems there often is no operating system as 
a separate entity. In such naked cases, it is normal to stay in the Supervisor 
state and ignore the existence of the User state. We will do this for our project 
in Part 3. However, the security of a two distinct states is important for the 
reliable operation of more sophisticated embedded systems, especially where an 
extensive interrupt driven configuration is being used. 

Finally, bit 15 of the Status register is the Trace bit. When set to 1, a Software 
interrupt/Trap will occur at the end of each instruction execution. This can be 
used in conjunction with a suitable operating system routine to print out infor¬ 
mation, such as the register contents after each step of the program 0. The 
Trace bit is turned off on Reset. 

REGISTER ARRAY 

As in all microprocessors, the 68000 has a Program Counter (PC) which essentially 
points to the next instruction to be fetched. With this MPU, the situation is a 
little more complex. This is because use is made of time when the external buses 
would normally be idle, to bring down words from program memory into a 2-word 
prefetch queue buffer |l]. For example, when a Branch is executed, both the next 
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instruction and the Branch-to op-code will already be in the buffer. Which one 
executed, will depend on the outcome of the condition test. Like most of the 
registers, the PC is 32-bit wide, but in the basic 68000 MPU only the lower 24 bits 
have any connection to the external address bus. 

Two arrays of eight 32-bit registers are of major concern to the programmer. 
These are functionally divided into Data and Address registers. Data registers 
provide the source or/and the destination data for most operations. The Ad¬ 
dress registers hold pointers to data stored outside in the MPU's memory. Mo¬ 
torola have made a considerable effort to ensure that these registers behave in a 
consistent and regular manner (they use the term orthogonal); for example any¬ 
thing that can be done on DO can also be done in exactly the same manner on, 
say, D7. However, they have made a clear distinction between registers holding 
operational data (Data registers) and those used to compute addresses (Address 
registers). 

The eight Data registers are the equivalent to the one or two Accumulator reg¬ 
isters found in most 8-bit MPUs. Most instructions use at least one Data register 
to hold a source or destination operand; for example: 

ADD.L [ea],D0 ; [DO] <- [DO] + [ea] 

adds the 32-bit long operand at some effective address (ea) to DO, answer in DO. 
ADD.L Dl,[ea] ; [ea] <- [ea] + [Dl] 


adds the long operand at some ea to D1, answer in ea. 


Any Data register can be treated as an 
example: 

CLR.L D2 ; [02(31:0)] <- 

MOVE.B #0FFh,D2 ; [D2(7:0)] <- 

MOVE.W #0FFFFh,D2 ; [D2(15:0)] <- 

MOVE.L #0FFFFFFFFL,D2 ; [D2(31:0)] <- 


-bit, 16-bit or 32-bit Accumulator; for 


00000000 00000000 00000000 00000000 
00000000 00000000 00000000 11111111 
00000000 00000000 11111111 11111111 
11111111 11111111 11111111 11111111 


Any bits outside the target field remains unchanged. I have used the notation 
D2 (n :m) as meaning bits n through m of Data register 2. Most instructions acting 
on Data registers come in all three size varieties, indicated to the assembler by 
using the extensions . B (for byte), . W (defaults to word) and . L (for long-w ord). 
Two bits in the op-code word are used to represent the size, as shown in Fig. 14.41 
There are also a few instructions which can affect any bit in a Data register; for 
example: 


B5ET #12,D4 ; Sets bit 12 of D4.L high, the rest unchanged. 

In order to make it difficult to use an Address register for anything other than 
its legitimate role, only a small range of special instructions can be used to alter 
their contents. For example, to set up A0 to address OOOOCOOOh we have: 

M0VEA.L #0C000L,A0 ; [A0(31:0)] <- OCOOOh 


An ordinary MOVE cannot target an Address register, although it is possible to 
copy the data in an Address register to a Data register; for example: 
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MOVE.L A1,D4 ; [04(31:0)] <- [Al(31:0)] 

Other Address register modification instructions include (ADD to Address 
register) ADDA, (Compare with Address register) CMPA and (Subtract from 
Address register), SUBA. Except for CMPA, such operations do not affect the 
CCR flags. Only long and word-sized operations are allowed. The full 32-bits are 
always affected, even where word-sized operations are used. In this case, bit 15 
is sign-extended to 32 bits; for example: 

MOVEA.W #0C000h,AO ; [A0(31:0)] <- FFFFCOOOh 

There are no byte-sized operations on Address resisters. 

Like the Data register array, all Address registers behave in the same way, 
except A7 is special in that it is used as the System Stack Pointer for subroutines 
and interrupts. The MOVE Multiple (MOVEM) instruction, which when targeted to 
A7 is equivalent to Push and Pull in other MPUs, can also be used with any other 
Address register (see Section 4.1). 

The Address registers have their own arithmetic circuitry, allowing effective 
addresses to be calculated in parallel with any data calculation. Like the 6809 MPU, 
the 68000 has an extensive range of Indexed addressing modes; for example: 

MOVE.B 64(A0,D7.L),DO ; [D0(7:0)] <- [64+[AO(31:0)]+[D7(31:0)] 

copies the contents of the data byte located in wherever A0 points to plus the 
32-bit variable in D7 plus the constant 64 into the lower byte of DO! Incidentally, 
if there is going to be lots of activity around this area of memory, the instruction: 

LEA 64(A0,D7.L),A1 ; [Al(31:0)] <- 64+[A0(31:0)]+[D7(31:0)] 

puts the effective address (A0.L plus D0.L plus 64) in Al; and future accesses 
can be made without further calculation using Al as a pointer. More about Load 
Effective Address and address modes in Chapter 4. 

CONTROL CIRCUITRY 

The 68000's Instruction decoder uses a microcoded design |I6]| as opposed to 
the random logic employed by 8-bit processors, such as the 6809. The order of 
magnitude increase in complexity exhibited in 4th generation devices makes the 
design and testing of the more efficient random logic circuitry difficult. Thus 
the disadvantages of larger and slower circuitry are considered more than offset 
by the advantages of simplicity of design and testing, as well as the flexibility 
of an easier change or enhancement of operation. In a microcoded design, the 
sequence of steps in implementing an instruction are stored in integral ROMs 0. 


The 68008 MPU is an 8-bit data bus version of the 68000. Despite the reduced 
external functionality, as can be seen from Fig. 13.21 internally the two processors 
are the same. Software is identical for both processors, although execution times 
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are typically 40% longer, due to the larger numbers of 8-bit fetches, as opposed 
to 16-bit equivalents 0. This still makes the 68008 a powerful alternative to a 
purely 8-bit MPU, and it is often used for this purpose in embedded MPU circuitry. 
Although the device itself is similar in price to its bigger brother; the smaller 
package, bus width and number of memory chips (see Figs. 13. Ill 13. 121 and II 3. 31 
considerably reduces board space and hence costs. 


3.2 Outside the 68000/8 

The 68000 MPU is available in a 64-pin package, which is shown in Fig. 13.31 to¬ 
gether with the 48-pin 68008. Unlike the 8086 family, all bus signals are non- 
multiplexed. All signals are TTL voltage-level compatible. The 68HC000 is a 
CMOS version with slightly different electrical and timing specifications. Unless 
otherwise stated, figures are given for the normal ffMOS version. 


ADDRESS BUS and ADDRESS STROBE (a n & AS) 


The term address' is normally used in a rather careless way without qualification. 
Address of what? In an 8-bit processor, at the hardware level it can be taken as 
the bit pattern on the address bus, which is externally decoded to physically 
enable the target 8-bit byte in memory or port onto the 8-bit data bus. Thus it 
is a byte address. In a 16-bit processor, it is a word address; that is, points to 
a word in memory space. In a high-level language, what meaning do we attach 
to the address of, say, an object comprizing an array of ten byte-sized elements 
stored in consecutive memory locations? The general convention is to specify 
the lowest byte address of the object. This is mainly for historic reasons, as MPU 
technology came of age with 8-bit devices. Thus, if the array fred[ ] is stored 
in memory between byte addresses 01 CO3Oh and 01 CO3Ah, then its address is 
01 C030h. In the 68000 MPU this base address is used for word and long-word 
sized objects. Thus the instruction M0VE.W 01C030h,D0 will bring the object 


MSB 


1C030 h 


LSB 


1C03U1 


down into D0(1 5:0). 


The physical address bus reflects this natural word size by omitting line ao- 
Thus each pattern on the bus a 23 - a] spans two internal byte addresses a 23 - ao, 
one even and one odd. As we shall see, the missing ao line is implicitly available 
in the guise of the two Data Strobe lines. Up to 8 Mwords or 16 Mbytes are directly 
accessible on this address bus. The 68008 MPU has a natural byte-sized word, as 
reflected in its byte organized address bus, which does explicitly provide an ao 
line. This 68008 has 20 address pins, from aj g to ao, giving a 1 Mbyte address 
space (there is a 52-pin version with 22 lines). 
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Figure 3.3 68000 and 68008 DIL packages. 

Address_Strobe (AS) is asserted when the state of the address bus is valid, see 
Figs. lB.OI & IB.ZI When enabled, the address lines can drive up to four LSTTL loads 
into a 130 pF capacitive load. AS can similarly drive six LSTTL loads. Both sets of 
lines are off when in a direct memory access (DMA) mode, whilst only the address 
bus is off when halted. 

DATA BUS and DATA STROBES (d n & LJDS / LDS / DS) 

The 68000 MPU uses a single bi-directional 16-bit data bus to carry both in- 
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struction and operand data to the MPU (Read) and from (Write). There is a prob¬ 
lem here, in that the 68000 sees a byte organized world out there through a 
word-sized eye. Figure 13.41 shows the execution cycle of a MOVE instruction in 
byte, word and long-word versions. In the case of a Read-byte action, the ac¬ 
tual data lines used for the transfer depend on whether an even address (upper 
eight lines) or odd address (lower eight lines) is specified. Data as considered 


in byte-sized lumps is organized as 


, EVEN 

, „ ODD 

UDS 

LDS 


Thus the 


Upper_Data_Strobe is seen to be equivalent to the missing ag (active when ag 
is 0, that is on even byte addresses) and Lower_Data_Strobe is active when ag is 1 
(odd byte address). Thus the two Data Strobes have a dual role. Firstly they signal 
when data is valid during a Write action, as shown in Fig. 13.71 Secondly they can 
enable either the upper or lower byte of an addressed word, effectively enabling 
the 16-bit data bus to carry a single byte f rom a wo rd-or ganized memory space. 

A word transfer is signalled with both UDS and LDS being together asserted, 
and the two bytes feeding the bus simultaneously. Notice that the most signif¬ 
icant byte (MSB) is always in the even address (lower byte address) in common 
with all Motorola MPUs (see page [20}. A long-word transfer simply involves two 
word transfers in sequence. As can be seen, the execution time here is longer by 
four clock periods (see Fig. 13.51 due to the extra transfer cycle. Byte and word 
execution takes the same time. In both word and long-word cases the data has to 
be organized starting with an even address (MSB). Attempts to do an odd-address 
word or long-word access; for example: 


M0VE.W 0C101h,DO ; This is erroneous 


is an error, and the 68000 will terminate execution by returning to the Supervisor 
state via an Address Error Trap (see Section 6.2). 

The 68008 has only a byte-sized data bus and a single Data Strobe (DS). There 
is no problem here, as address line ag is provided explicitly to reflect the natu¬ 
ral byte size of the data bus, and thus each target memory byte is individually 
enabled. This is exactly the same as an 8-bit MPU seeing the world through an 
8-bit eye. Nevertheless, the even boundaries restriction for word and long-word 
memory data are retained for compatibility with the 68000 processor. Execu¬ 
tion times for the 68008 are shortest for a byte operand; word and long-word 
operands taking one and three extra access cycles respectively. Fetching the op¬ 
code also takes twice as long. At a clock frequency of 8 MHz, the 68000 moves 
a word to a data register in 2ps, whilst a 68008 takes 4ps. However moving 
between registers; for example: 

M0VE.W D7,D0 ; A register to register move 

takes \ ps in both cases. The moral being to keep as much in the Data registers 
as possible. 

When the data lines and DS signals are enabled, they can drive up to six 74LS 
loads and 130 pF without external buffering. Data lines are high-impedance when 
the processor is halted or in a DMA mode. DS signals are off only during DMA. 


Reset 

Asserting both Reset and Halt together initiates a total Reset of the processor. 
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Figure 3.4 Memory Organization for the 68000. 


This must be held for at least 100 ms when the power is initially applied. This 
ensures stabilization of the internal bias voltage generator and external clock 
source. Otherwise a duration of ten clock cycles is sufficient. 

A total Reset causes the contents of the long-word at 000000 - 3h to be moved 
into the Supervisor Stack Pointer (its initial setting) and long-word 000004 - 7h to 
be moved into the Program Counter (the Restart address, see Fig. l6.Sl) . The Status 
register is also set to Supervisor state (S = 1), Trace off (T = 0) and Interrupt 
mask to 7 (12 II 10 = 11 1). No other registers are affected. 
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Reset can also act as an output signal, activated by the privileged instruction 
RESET. This drives the Reset pin low for 124 clock periods, which can be used 
to reset peripheral devices. Because of this bi-directional action, external restart 
circuitry must be more complex than a simple switch. An example of a typical 
circuit (SJ is shown in Fig. ll3.3l Reset will also be driven low, together with Halt 
when a Double-bus fault occurs, as described in the next paragraph. 

Halt 

Like Reset, this is also a bi-directional line. As an input it can be used in conjunc¬ 
tion with Reset or alone. When asserted alone, it will stop the processor after the 
current instruction is finished. The address and data buses will then be floated, 
and other Control outputs negated. If Halt is then released for one cycle, the 
processor will execute the next instruction and then stop. So, Halt can be used 
to single-step the processor for debug purposes 0. 

As an output, Halt is driven low (together with Reset) when the initial Super¬ 
visor Stack Pointer setting obtained from the Vector table on Reset is odd or a 
Bus Error is active in an exceptional event (see page 11611 . This is known as a 
Double-bus fault. Halting the MPU is the obvious thing to do in these cases, as 
such events are unrecoverable. 

Read/Write (R/W) 

This is low during a Write cycle, otherwise high. It is floated during DMA and, as 
a precaution, normally has a pull-up resistor to prevent erroneous writes during 
this situation. It can drive up to six LSTTL loads into 130 pF. 

Data_Transfer_ACKnowledge (DTACK) 

This is a signal sent back by the addressed device to indicate that the peripheral's 
data is valid during a read cycle and that the peripheral is ready to accept the data 
during a Write cycle. This asynchronous handshake protocol is discussed in detail 
in the next section. 

lnterrupt_Priority_Level (IPL0IPL1 IPL2) 

These input pins are driven from external devices requesting an interrupt. The 3- 
bit active-low code thus placed is its priority level, ranging from zero (111) for no 
interrupt (quiescent state) to seven (000) for a non-maskable top-priority request. 
The I PL pins are constantly monitored, and any change lasting a minimum of two 
successive clock periods is internally latched. At the end of each instruction, the 
latched request level is compared with the Interrupt mask bits setting in the Status 
register and acted upon if higher. If masked to level 7, a change from a lower level 
to level 7 request will trigger an edge-triggered non-maskable interrupt response. 
More details in Section 6.1. 

The 68008 MPU (except in its 52-pin version) internally connects the IPL0 and 
IPL2 lines as shown in Fig. 13.21 This means that only levels where bits 0 & 2 are 
the same (/1 / = 0, 707 = 2, 010 = 5 and 000 = 7) are available. 
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Bus_ERRor(BERR) 

This input acts as a special type of interrupt used to inform the processor that 
something has gone wrong out there. As an example of what can go awry, perhaps 
the addressed peripheral has not sent back its DTACK acknowledge signal. If 
this continues indefinitely, the processor will hang up forever waiting for the 
peripheral to respond. Using a re-triggerable monostable activated by DTACK to 
drive BERR would ensure that in the absence of a correct response, say within 
10 ms, the monostable will relax and alarm the processor. The use of a watch¬ 
dog' timer like this can be extended to ensure that the veracity of the program 
in high-noise situations, which can corrupt data and address lines, causing the 
processor to go off to some illegal memory space and do its own thing. By using a 
few lines of the legitimate program to trigger a watch-dog at some regular interval, 
a Bus Error can be signalled if this area of program is not entered. See Section 6.2 
for more details. If a Bus Error occurs during the Restart process, signalling that 
the Reset vectors cannot be accessed, then the MPU stop with the HALT pin low. 

During normal execution, if the external error-detection circuitry also drives 
the Halt line in the correct fashion, the processor can be persuaded to rerun the 
cycle which caused the error fTol . 

When a Bus Error occurs, the processor pushes data onto the Supervisor stack, 
which can then be used by the operating system for diagnostic purposes. If a Bus 
Error continues to be signalled, then a Double-bus fault is said to have occurred. 
The processor signals this catastrophe by bringing Halt low and stopping. 

Function_Code (FC2 FC1 FCO) 

These three outputs inform the outside world concerning the state of the proces¬ 
sor according to the codes: 


FC2 FC1 

0 0 

0 1 

i 0 


FCO 

0 User state accessing Data memory 

0 User state accessing Program memory 

0 Supervisor state accessing Data memory 

0 Supervisor state accessing Program memory 

1 Interrupt acknowledge 


Being able to distinguish between User and Supervisor states allows the hardware 
engineer to design address decoding circuitry which accesses different RAM and 
ROM chips. Knowing that an interrupt is being serviced is useful in cancelling 
the request, as discussed in Section 6.1. 

Function_Code outputs can drive up to four LSTTL loads into 130 pF. They go 
high impedance on DMA. 


Bus_Request(BR) 

External devices that wish to take over the buses for direct memory access (DMA) 
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do so by asserting BR for as long as necessary. Tie high if not being used. 

Bus_Grant(BG) 


The 68000 asserts BG in response to a bus request. Once the Address_Strobe is 
negated, the DMA device can take over the buses. 

Bus_Grant_ACKnowledge (BGACK) 

Before taking over the buses, the DMA device checks that no other DMA device 
is asserting BGACK. If it is, the new device waits until BGACK is negated before 
asserting its own BGACK and proceeding. All DMA devices have their BGACK 
outputs wire-OR'ed together. The 68008 does not have this handshake input 
(except for the 5 2-pin version) and so can only handle systems where only one 
DMA device is present. 

CLocK (CLK) 

This must be driven by an external TTL compatible oscillator. Small crystal con¬ 
trolled DIL packaged circuits are readily available for this purpose. Rise and fall 
times should be 10 ns or better (8& 10 MHz). Maximum frequency versions of 8 
and 10MHz are readily available with 12.5 and 16MHz (not 68008) variants ob¬ 
tainable. The 68040/68060 can run up to 50 MHz. A typical Read or Write cycle 
needs four clock pulses (see Figs. l3.5l & l3.61 . thus taking between 500ns (8MHZ) 
through to 80ns (50MHz). The 68000/8 has internal dynamic circuitry, so has 
a lower clock frequency bound (2 MHz for the HMOS devices, 4 MHz for CMOS 
versions). 

E 

This output is CLK frequency divided by ten (six low, four high). It is equivalent to 
the same-named signal in the older 6800 and 6809 MPU's (see Figs. ll.3l & ll.4l> . and 
is used when interfacing to the older style specialized 6800-oriented peripheral 
devices. It can drive up to six LSTTL loads at 130pF. 

Valid_Memory_Access (VMA) 

This is also an old-style' 6800 type signal (not 6809). It indicates that the address 
bus data is valid, and is used as an Address Strobe synchronized to E for old-style' 
peripheral devices, such as the 6821 PIA (see Fig. l3.14t . This is not available on the 
68008 MPU, but can be generated with external circuitry im. It is only generated 
when external circuitry asserts the MPU's VPA pin, and then will take some time 
to lock into the E signal. 

Valid_Peripheral_Address (VPA) 

This input, which is usually driven from the address decoder, indicates that the 
location the MPU wishes to communicate with is populated with a 6800-style 
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peripheral, and that a special 6800-type data transfer cycle (using E & VM A) should 
be used. VPA is also used to indicate that the processor should use automatic 
vectoring to respond to an interrupt, as described in Section 6.1. 

Power (V cc & GND) 

The HMOS 68000/8 MPU dissipates 1.5 W maximum at a V cc of 5 ± 0.25 V and 
a mean current of 300 mA. However, current peaks of as high as 1.5 A can be 
expected. The CMOS 68HC000 uses a maximum average current of 25 mA at 
8 MHz (35 mA at 12.5 MHz), but still may require peaks of 1.5 A. These figures do 
not include that taken by any loads. 


3.3 Making the Connection 

Like all microprocessors, the 68000/8 communicates with the outside world via 
its data bus through interface circuitry. The sequence of events during a trans¬ 
action is a consequence of the interplay of the various control signals. However, 
unlike most 8-bit MPUs, the 68000/8 is controlled in an asynchronous manner, 
where the completion of a Read or Write cycle is dependent on the source or 
destination responding with a handshake when ready to go ahead. In the simple 
open-loop synchronous situation, as shown in Fig. 11.51 the transaction is com¬ 
pleted at the end of the clock cycle irrespective of the state of readiness of the 
peripheral. Although it is certainly possible to extend the cycle by freezing the 
clock (in the 6809 by using MRDY), this is very much the exceptional way of acting. 

The closed-loop nature of asynchronous data transfer is clearly shown in 
Fig. 13.51 where feedback lines exist between each peripheral in the system and 
the microprocessor. When contacted (i.e. enabled by the address decoder) the ex¬ 
ternal device responds when ready with a Data_Transfer_ACKnowledge (DTACK) 
signal. Only then will the MPU complete the transaction. 

We will use timing diagrams to look at this sequence of events in more detail, 
both when doing a Read and doing a Write to the outside world. In both cases the 
clock is internally split into eight phases (see Fig.EjJ, each of which initiates some 
micro-action. Based on this division, the sequence of events can be illustrated. 

The Read cycle of Fig. l3.6l shows the address stabilizing early in the cycle, with 
the AS and DS Strobes then being asserted. DS is used as the generic term for 
UDS and LDS; one or both of which are asserted according to the rules of Fig. 13.21 
When the peripheral is ready, it responds by putting its data on the bus and 
asserting its handshake, DTACK. The MPU then proceeds by latching in the data. 
The MPU then terminates the cycle by negating its Strobes. The peripheral then 
responds by removing its data and raising its DTACK. 

In more detail: 

1. The address bus's data will be valid wit hi n t CLAV (Clock Low to Address Valid) 
of the beginning of phase 1 (<p-\). 
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2. The AS and DS strobes are asserted by t CHSL (Clock High to Strobe Low) follow¬ 
ing the start of <p 2- 

3. The peripheral device responds when ready by asserting its DTACK line. If this 
can be done by t ASI (Asynchronous Setup Input) preceding the end of c/> 4 , then 
the cycle will go ahead. Otherwise, the MPU will insert wait states of one clock 
period each (two phases) until DTACK is recognized on the falling edge. 

4. The peripheral must set up its data on the bus no less than t D | CL (Data In to 

Clock Low) before the \_of cp^, to ensure a successful read by the processor. 

5. The AS and DS Strobes are then negated by no more than t CLSH (Clock Low to 
Strobe High) following <p^. 

6. The peripheral has up to two clock periods from this point to negate its DTACK 
and remove its data. 

Function_Code values, not shown in the diagram, are stable for the duration 
of the asserted Strobe signals, as is R/W (high for Read.). 

The Write cycle time sequence shown in Fig. 13.71 is broadly the same as for 
reading. This time data is put on the bus by the MPU, and it is the job of the 
peripheral device to capture this and acknowledge with the DTACK handshake. 
The Data_Strobes are not asserted until the outgoing data is valid; somewhat later 
in this situation than the Address_Strobe; which indicates a valid address. After 
UDS/LDS is negated, the data is taken off the bus, and the peripheral should now 
terminate its handshake. 

In more detail: 

1. The address bus will be valid within t CLA v (Clock Low to Address Valid) of the 
beginning of phase 1 (<pi). 

2. AS is asserted by t C HSL (Clock High to Strobe Low) following the start of <p 2 - 

3. The MPU sends out data on the bus by no later than t CLD0 (Clock Low to 
Data Out) following <p 3 - 

4. The UDS/LDS Strobes are asserted by t CHSL following the start of 0 4 . 

5. The peripheral device responds when ready by asserting its DTACK line. If this 
can be done by t AS | (Asynchronous Setup Input) preceding the end of c/) 4 , then 
the cycle will go ahead. Otherwise the MPU will insert wait states of one clock 
period each (two phases) until DTACK is recognized on the clock \_ . 

6. All Strobes are negated by no more than t CLSH (Clock Low to Strobe High) fol¬ 
lowing 

7. Anytime after this, the peripheral can lift its DTACK handshake. 

8. The MPU lifts its data off the bus by no less than tsHODi (Strobe High to Data Out 
Invalid) after the Strobes negate. This is the time a peripheral has to grab the 
data (including its setup time) after a _J Strobe edge (30 ns for the 8 MHz 
device, 20 and 15 ns for the 10 and 12.5 MHz devices respectively). 

Not shown are the Function_Code settings, which are valid for the duration of 
the AS Strobe, whilst R/W is low for Write as long as the DS is active. 

Designing an address decoder involves the definition of logic which will imple¬ 
ment the Boolean equations describing which combinations (addresses) of input 
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variables (address lines) are to select the various peripheral devices. In this re¬ 
gard the 68000/8 does not differ from that for an 8 -bit processor (see Section 2.3), 
although the larger number of variables is a further inducement to use more so¬ 
phisticated implementations, such as programmable array logic 1121 . This is es¬ 
pecially the case where high speed versions demand small propagation delays. It 
is beyond the scope of this book to discuss the merits and features of the various 
circuitry, reference IT31I gives a good review for the interested reader. 

A rather unlikely, but nevertheless working circuit, is shown in Fig. 13.81 Here 
the 16 Mbyte address map can be considered split into four quarters using a 23 
and a 22 - A 74LS154 4 to 16-line decoder further splits the quarter defined by 
a 23 a 22 = 00 into 16 pages of 256Kbytes each. PageO is again sub-divided into 
eight ' paragraphs' of 16 Kbytes, which are assumed to directly enable the labelled 
devices. In the cases where only a single peripheral interface is indicated, further 
levels of decoding may be used. EPROM_EN combines two of these paragraphs 
using a 74LS08 AND gate, as 27128 EPROM pairs have a 16Kword (32 Kbyte) 
capacity. 

The secondary decoder is qualified by AS. As AS is only asserted when the 
address signals have stabilized, this ensures that there are no spurious outputs 
during times when the address bus is in transition. With AS being asserted ap¬ 
proximately one clock phase after the address is valid, it should be applied to 
the last decoder stage. This allows primary stages to ' get on with it’ as soon as 
possible, and hence reduce the decoder's overall propagation delay. When high 
clock-speed versions of the 68000 are used, AS is commonly fed directly to the 
peripheral or memory's Chip Enable, to further reduce this delay; an example 
being shown in Fig. 1 3.1 1 ( b). 

Address decoding for the 68008 is identical to that for the 68000, but only 
address lines up to aj 9 are available. Thus, a functionally equivalent page 0 split 
could be obtained by replacing the 74LS154 decoderby a 74LS08 AND gate acting 
on ai 9 and ai 3 . 

As we have seen, each peripheral addressed by a 68000 family MPU must re¬ 
ply by asserting the DTACK line when ready. All 68 xxx peripheral devices specifi¬ 
cally designed to function in an asynchronous manner automatically provide this 
handshake signal. An example if this is the 68230 Parallel Interface/Timer (PI/T) 
shown in Fig. 13.131 However, memory chips and elementary interface devices 
such as 3-state buffers and latches do not generate this information. 

In the simplest of situations the 68000/8 MPU will run with its DTACK input 
permanently asserted. No wait states will be inserted into its Read or Write cycle; 
so all memory and peripheral interface must be fast enough to function correctly 
in the allowed time. Figure [T5T61 shows an example of this treatment of DTACK. 

A slightly more sophisticated approach is depicted at the bottom of Fig. 13.81 
Here the pulse actually enabling the relevant device is also fed back to acknowl¬ 
edge readiness. This will activate shortly after AS is asserted, and will thus appear 
well before the end of clock phase 4, and no wait states will be introduced. The 
AND gate used to sum the Enable signals to the relevant interfaces and memory, 
is open-collector. Thus other similar signals from elsewhere in the memory space 
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From MPU 


Feedback 


Figure 3.8 A simple address decoder with no-wait feedback circuitiy. 
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DTACK 



DTACK 


EN = 1 

EN = 0 

00000000 

1 

CK _r 

10000000 

1 

ck _r 

11000000 

1 

ck _r 

11100000 

1 

ck _r 

11110000 

1 

ck _r 

11111000 

1 

ck _r 

11111100 

1 

CK J - 

11111110 

0 

ck _r 

11111111 

0 


Figure 3.9 A DTACK generator for slow devices. 


can be wire-ORed to the one DTACK pin; see Fig. l3.9l The PI/T_EN of Fig. [T8l does 
not take any part in this scheme, as the 68230 provides its own open-collector 
DTACK handshake output (see Fig. I3.12D . 

Although this approach is more flexible than simply grounding DTACK, it 
still assumes that the addressed device is fast enough not to require wait states. 
Where fast 68000 MPUs are used, this is not likely to be the case for all periph¬ 
erals. Peripherals such as EPROMs and LCD interfaces tend to be rather slow. In 
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such situations a delay circuit is needed for each such DTACK reply. This may 
take the form of a monostable, counter or shift register. An example of the latter 
is given in Fig. 13.91 Normally when the device in question is not being accessed, 
DEV_EN is high and all eight flip flops are low. The 74LS05 open-collector buffer 
is then off. When the device is selected, DEV_EN goes low trailing AS by the ad¬ 
dress decoder's propagation delay; thus releasing the register's CLR. As the serial 
inputs are permanently held high, the flip flops will each in turn become logic 1, 
with an advance from Q A to Qh on the rising edge of the 68000's Clock. Assum¬ 
ing that the decoder's and 74LS05's propagation delay plus the 74LS164's setup 
time is less than the difference between AS being asserted and t AS! before the end 
of clock phase 4 (approximately one clock cycle, see Figs. |3.6| and |3.7| i. then wait 
states of between 0 and 7 clock periods are available according to the position 
of the link. Once the logic 1 reaches the link, the 74LS05's output goes low and 
DTACK is asserted. 

Two 74LS377 octal flip flop registers are used in Fig. l3.10l to illustrate the im¬ 
plementation of an elementary 16-bit output port. The registers are both enabled 
by the address decoder, and the data clocked in by one or both Data Strobes, as 


in Fig. 1 3. 71 There is a minimum of t S HDOi between this point and the data becom¬ 
ing invalid. In determining the margin, the hold time (5 ns) for the 74LS377 must 
be subtracted. In the case of the 8 MHz 68000, this gives a worst-case margin 
of 25ns, which shrinks to 10ns for the 12.5MHz version. There is no problem 
meeting the 25ns 74LS377 setup requirement. 

From these figures, it is clear that the Data_Strobes should directly clock the 
registers and not be gated via additional logic. For example; it is tempting to 
use R/W ANDed with UDS/LDS to ensure that an accidental read from this port 
does not latch in irrelevant data. The alternative of using R/W in conjunction 
with OUT_EN is preferable for this purpose. The falling edge of UDS/LDS via an 
inverter or gate cannot be reliably used as the clock, as it is just possible that 
if tcLDO is a ma xi mum and f CH sl is a minimum, the data will not be valid at this 
point. 

In the case of the 68008 MPU, one 74LS377 will give an 8-bit output port, with 
DS acting as the clock (see Fig. 113.11) . The same timing considerations hold. 

The 6264 is a static CMOS 64 Kbit RAM organized as an 8K x 8 array. It is 
commonly available in 100, 120 and 150ns access time selections. Taking the 
Hitachi HM6264CP-10 as an example of a 100 ns device; the access time defining 
the minimum period from a stable address and device enabled (CS1 = 0, CS2 = 
1) before data becomes valid during a Read. When writing, the address must 
be stable for the full 100 ns and for at least 80 ns of this time the device must 
be enabled and R/W = 0 for a successful Write-to action. The address must 
remain stable for at least 5 ns after CS1 or R/W change state, or 15 ns after CS2 
deactivates. 

Referring to Fig. l3.lll a). we see that two broadside 6264s provide the 16 bits 
at each word address. As there is no ao byte address bit available from the 68000 


appropriate (see Fig. l3.4> . The rising edge of the Strobe is the active transition; 6 
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From Address Decoder or to DTACK Generator 



Upper Data Bus Address Bus Lower Data Bus 


(a) Connecting the RAM chips to the buses. 



Omitted from Address decoder 

RAM_R/W 

OE 


(b) An improved scheme for high-speed MPU versions. 


Figure 3.10 A simple word-sized output port. 


MPU, address lines ai -ai 3 drive the A 0 -A 12 RAM inputs, with UDS and LDS ef¬ 
fectively providing the byte selection. 

To determine whether wait states are required in using these devices, we need 
to analyze the timing constraints rm Essentially the RAM is enabled for the 
duration of the Data Strobes. As this is shortest during a Write cycle, we will use 
this as the determining factor. From Fig. 13.71 the worst-case width of UDS/LDS 
is 6 - 4 , or three clock phases - £chdi_J if we assume a minimum t CL s H of zero 


(no figure is given). For the 8, 10 and 12.5 MHz MPUs, this is 120, 90 and 60ns 
respectively. Thus the 80 ns HM6264FP-10 figure is suitable for up to 10 MHz 
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From Address Decoder or to DTACK Generator 



Upper Data Bus Address Bus Lower Data Bus 


(a) Connecting the RAM chips to the buses. 



Omitted from Address decoder 

RAM_R/W 

OE 


(b) An improved scheme for high-speed MPU versions. 


Figure 3.11 Interfacing 6264 RAM ICs to the 68000 MPU. 


systems. Actually we are being unduly pessimistic, as the 68000 data sheet gives 
t dsl (Data Strobe Low) minimum as 80ns for the 12.5 MHz MPU. For the Read 
cycle, 160 ns is the equivalent 12.5 MHz figure, rising to 240 ns for the 8 MHz 
version. 


We have assumed that the propagation delay through the address decoder is 
such that RAM_EN is asserted before the Data Strobes. During a Write cycle this 
is the time between 4 and 2 in Fig. 15.71 around one clock cycle. In the case of 


a Read cycle, the propagation delay must be subtracted from the t DSL time that 
the Data Strobes are low. In higher speed circuits, this propagation delay can be 
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Figure 3.12 Fast EPROM interface. 
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minimized by omitting AS from the address decoder and using it to qualify the 
R/W signal, as shown in Fig. 13 .111 b). This is more economical than qualifying the 
RAM_EN signal, as the modified R/W (i.e. RAM_R/W) can be used for any number 
of RAM chips. The inverted MPILR/W is normally used in this situation to turn 
off the output 3-state buffers during a Write, by activating Output_Enable (OE). 
Turn-off time is quicker from OE than from the RAM's Chip Select or R/W. 

EPROMs cause problems as they tend to be very much slower. A typical 27128 
16Kx8 EPROM has a 250 ns access time from stable address/asserted Chip_Select. 
Even at 8MHz, there is only 235 ns from the falling edge of AS until within the 
setup time before the end of <£g (5 x cycles - Ichsl — t docl) ■ Fortunately, the time 
from Output_Enable (OE) to data valid is much less; for example 100 ns for the 
Hitachi HN4827128AG-25; and the circuit of Fig. 13.121 makes use of this means 
of access. Here CS is enabled whenever R/W is high; that is, during each Read. 
The R/W signal is valid no later than 70 ns after </>q, which gives around 350 ns 
enabling time to the end of cp^, less setup time t D!CL ([T] in Fig. 13.61 ). Provided 
that the EPROM's OE is enabled at least 100 ns prior to this endpoint, a successful 
Read will occur. As the time between AS enabling the address decoder and this 
point is 235 ns, 135 ns is left to more than adequately cover this delay. 

Faster CMOS EPROMs, such as the 150 ns National Semiconductor NMC27C64 
(60 ns from OE) facilitate no-wait state operation for faster processors. Alterna¬ 
tively the contents of slow EPROM could be transferred ' lock-stock and barrel’ 
to fast RAM at the beginning of the program, and the EPROM henceforth ignored. 
This technique is frequently used in IBM PCs, where the BIOS is shadowed in RAM 
during the booting process. 

RAM and ROM are interfaced to the 68008 MPU in the same way, but this time 
the MPU provides the byte-address bit ao, and this goes to the memories' Aq line. 
DS replaces UDS and LDS, see Fig. 113.31 

The 68000 family are supported by a series of dedicated peripheral interface 
devices. The 68230 Parallel Interface/Timer (PI/T) is typical of these, providing 
three 8-bit peripheral ports, two with handshake, and sharing functions with an 
internal timer together with interrupt facilities. As shown in Fig. 13.131 interfacing 
is straightforward, with a Data Strobe enabling the device together with the ad¬ 
dress decoder output. DTACK is internally generated and is connected directly to 
the MPU's DTACK node. Handshaking for the Interrupts (one for the parallel in¬ 
terface PIRQ/PIACK and one for the timer TOUT/TIACK) is provided, as described 
in Chapter 6. 

There are 25 internal registers addressed by the five Register Select inputs 
(RSI - RS5). As shown driven by address lines ai - a 5 , they will appear at alternate 
byte addresses. Although this presents little inconvenience, a special instruc¬ 
tion, M0VEP, can transfer two or four bytes at alternate addresses to suit this 
arrangement. 

The two main peripheral ports can be set up to act as one 16-bit port, although 
the rather strange decision to use an 8-bit data bus means that two cycles are 
needed to transfer the data word. Programming the 68230 is complex and beyond 
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Figure 3.13 Interfacing the 68230 PI/T to the 68000's buses. 
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the scope of this text; see reference 1151 for a good description. 
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Figure 3.14 Interfacing a 6821 Peripheral Interface Adapter to the 68000. 

When the 68000 MPU was first released in 1979, the decision was taken to pro¬ 
vide an operating mode to allow its use with the existing 68xx family of peripheral 
interface devices. This would ensure that the MPU was immediately useful with¬ 
out having to wait for further device introductions. We have already met the 
6821 PIA in Fig. ll.9l and Fig. 15.141 shows this device in the alien environment of 
the 68000. 

Essentially a 68xx device prompts the 68000 MPU about its special status by 
asserting the latter's VPA input, rather than DTACK; as shown in Fig. 13.81 The 
Read and Write cycles are then synchronized to the E clock to give the normal 
6800/6809-type synchronous data transfer sequence. The Valid_Memory_Address 
(VMA) status output is used as an Address Strobe in this mode. DTACK should 
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not be asserted during this time. As E is the 68000's clock divided by ten, then 
the normal 1MHz 6821 version is adequate up to 10 MHz systems. The 1.5 MHz 
68A21 is suitable for the 12.5 MHz 68000 MPU. 
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CHAPTER 4 


The 68000/8 Microprocessor: Its 
Software 


Although the 68000 architecture represents a complete break with its progenitor 
6800 family; its software is in reality an evolution rather than a break from ear¬ 
lier implementations. Many of the characteristics exhibited by the 6809 instruc¬ 
tion set (see Chapter 2) also appear in 68000 software, and indeed this is not 
surprising as they both support high-level language compilation, with extensive 
stack-oriented operations and a large repertoire of computed address modes. 

The use of a full 16-bit op-code allows considerable scope in handling the 
many instruction:op-code:register combinations. Nevertheless, a special effort 
was made to make the assembly-level software user friendly. There are only 
56 primary instructions Q], although variations on themes of several of these add 
another 29 mnemonics (eg. MOVE and M0VEQ for MOVE and MOVE Quick). Most 
instructions are orthogonal, in that they apply to all registers within a group (Data 
or Address) in the same manner. The ' rules of grammar' are fairly consistent 
across the range of instructions with relatively minor quirks 0. 

In this chapter we look at the more important of the instructions and their 
address modes. We will tie these together with the same example subroutines 
used to illustrate 6809 software in Section 2.3. The same assembler will be used 
here, details of which were given at that point. 68008 software is identical to that 
for the 68000 (except that only the lower twenty address bits are significant) and 
we will use the term 68000 as generic of the two. 

It would take a complete book, rather than a single chapter, to do justice to 
assembly-level programming for such a complex processor. References 0211s Si 6]| 
are recommended to the interested reader. 


4.1 Its Instruction Set 

We will briefly look at the machine-code structure of 68000 instructions at the 
end of the next section. As far as assembly level is concerned, instructions may 
be classified as three kinds; that is, inherent, single- and dual-operand. 

Inherent instructions have no operand, and are represented by mnemonic 
only, for instance the instruction Return from Subroutine: 
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RTS ; Program counter is pulled from System stack {Coded as 4E75h} 

Single-operand (or monadic) instructions, such as CLeaR, have only one entry 
in the operand field, for example: 

CLR.B OEOOOh ; [E000] <- 00 {Coded as 4439-0000-E000h} 

CLR.L DO ; [D0(31:0)] <- 00000000 {Coded as 4480h} 


Dual-operand (diadic) operations such as Move have the form: 

Mnemonic <Source operand>,<Destination operand> 
For example: 


MOVE.L DO,D1 

MOVE.B 4000h,OEOOOh 

MOVE.W DO,OEOOOh 


[D1C31:0)] <- [DOC31:0)] 
[E000] <- [4000] 

[E000:1] <- [D0(15:0)] 


{Coded as 2200h} 

{Coded as 03F9-4000-O000-E000h} 
{Coded as 33CO-OOOO-E0O0h} 


Data Movement is the the most common operation executed. Reference U\ 
reports a frequency count of about 33% for MOVE, and it is with this in mind 
that we start with Table [-TT1 Here we can see that only three mnemonics cover 
the range (see also EEA and PEA in Table P~2l) . Of these the chief is MOVE, which 
subsumes the Load and Store operations of the 6809 MPU. MOVE is so frequently 
used that Motorola made it the most flexible of all the 68000 operations, a true 2- 
address instruction. Data in 8-, 16- or 32-bit packets canbe copied from anywhere 
in memory, any register (except the PC) or immediately to any alterable memory 
or to any register (except PC). All other 2-operand instructions must specify a 
register as the source and/or destination, for instance ADD. B OCOOOh, DO. 

The M0VEA variation of the plain MOVE instruction must be used where an 
Address register is the destination. For example: 

M0VEA.L #0C000h,A0 ; [A0(31:0)] <- OOOOCOOO {Coded as 207C-O000-C000h} 

Like all specific Address register-destination operations, the CCR flags are not 
altered, and only word and long-word sizes are permitted. Word-sized operands 
are sign extended to 32 bits, for example: 

MOVE.W #0C000h,A0 ; [A0(31:0)] <- FFFFC000 {Coded as 307C-C000h} 

The state of the CCR flags can be set up using the MOVE <ea>, CCR variant 
(some assemblers use the non-standard mnemonic MTCCR for Move To CCR). 
Notice that its size is word only (the . W is usually omitted) although the CCR is 
byte sized. The Status register equivalent is MOVE <ea>,SR (or MTSR <ea>), and 
is only legal in the Supervisor state, that is privileged; but a Move From the SR, 
MOVE SR,<ea> (or MFSR <ea>), can be made from anywhere. The Move From 
the CCR is only available on the 68010 MPU and higher family members. 

The MOVE Quick (M0VEQ) instruction is targeted exclusively to the Data regis¬ 
ters. It is used to setup a 32-bifData register to a fixed long number between+12 7 
and -128 (signed 8-bit). Of course an ordinary MOVE can be used, but as the im¬ 
mediate data is included in the op-code for M0VEQ, the latter's execution is much 
faster, as shown here: 
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Table 4.1 Move instructions. 

Flags 


Operation 

Mnemonic 

X 

N 

z 

V 

c 

Description 

Move 

data 

MOVE.s3 

eal,ea2 

• 

V 

V 

0 

0 

Data, source to destination 
[ea2] <- [eal] 

to Address reg. 

MOVEA.s2 

ea, Dn 

• 

• 

• 

• 

• 

[An] <- [ea] 

quick 

MOVEQ 

#±ds,Dn 

• 

V 

V 

0 

0 

[Dn] <- #±d8 

regs to memory 

M0VEM.s2 

SRn.ea 

• 

• 

• 

• 

• 

[-ea] <- £Rn 

memory to regs 

M0VEM.s2 

ea,XRn 

• 

• 

• 

• 

• 

ZRn <- [ea+] 

to CCR 

MOVE.W 

ea,CCR 

V 

V 

V 

V 

V 

[CCR] <- [ea] 

to SR 

MOVE.W 

ea, SR 

V 

V 

V 

V 

V 

[SR] <- [ea] , privileged 

from SR 

MOVE.W 

SR, ea 

• 

• 

• 

• 

• 

[ea] <- [SR] 

Exchange 

EXC.L 

R1,R2 

• 

• 

• 

• 

• 

Switch two registers 
[R2] <-> [Rl] 

Swap 

SWAP 

Dn 

• 

V 

V 

0 

0 

Switch lower/upper words 
[D C3 1 : 16) ] <--> [D(15 : 0) ] 


0 

Flag always reset 

Rn 

Data or Address register n 

1 

Flag always set 

An 

Address register n 

• 

Flag not affected 

Dn 

Data register n 

V 

Flag operated in the expected way 

Dn(x:y) 

Data register n, bits x to y 

s3 

Three sizes, . B, .W, . L 

#±ds 

Signed 8-bit value 

s2 

Two sizes, . W, . L 

[ 1 

Contents of 

ea 

Effective Address or immediate data 

<- 

Becomes 


MOVE.L #1,DO ; [D0(31:0)] <- 00000001 (12~) {Coded as 223C-0000-0001h} 

MOVEQ #1,DO ; [D0(31:0)] <- 00000001 (4~ ) {Coded as 7001h} 

MOVEQ #-l,DO ; [D0(31:0)] <- FFFFFFFF (4~ ) {Coded as 70FFh} 

where ~ indicates clock cycle. Thus the ordinary MOVE takes 1.5 ps at an 8 MHz 
clockrate against 0.5 ps for a MOVEQ. The timings for the 68008 MPU are 24~ (3 ps) 
and 8~ (1 vs) respectively. Note that all 32 bits of the Data register are affected. 
There is no MOVEQ. B or MOVEQ. W; an ordinary MOVE must be used in cases where 
only the lower 8 or 16 bits are to be setup. 

Using a regular MOVE with the appropriate address mode gives the equivalent 
of a Push or Pull operation; for example: 

MOVE.L DO,-(SP) ; Same as PSHS DO (14~) {Coded as 2F00h} 

pushes all of DO out to the System stack, after the System Stack Pointer A7 has 
been decremented four bytes, and 

MOVE.L (SP)+,DO ; Same as PULS DO (12~) {Coded as 201Fh} 

pulls four bytes off the System stack into D0.L and then increments the System 
Stack Pointer. The actual System stack used depends on whether the MPU is in the 
Supervisor or User mode, the assembler allowing the use of the mnemonic SP or, 
indeed A7, for either System Stack Pointer. Note that a MOVE. B to/from the System 
stack always results in a word being transferred, to preserve the evenness of the 
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System Stack Pointer (i.e. A 7). Any of the other Address registers may be used in 
place of A7. Pre-Decrement and Post-Increment address modes are discussed in 
the next section. 

As there are 16 registers which may have to be pushed or pulled, clearly a 
single instruction which can save or retrieve any or all Address and Data registers 
at one go will be more efficient. The MOVE Multiple instruction fulfils this task; 
for example: 

MOVEM.L D2/D3/D4/A2,-(SP) ; Same as PSHS D2,D3,D4,A2 (40~) {Coded as 48E7-3820h} 

pushes all of D2,D3, D4 and A2 out to the System stack, the System Stack Pointer 
ending 16 bytes down; and 

MOVEM.L (SP)+,D2/D3/D4/A2 ; Same as PULS D2,D3,D4,A2 (44~) {Coded as 4CDF-041Ch} 

pulls the register contents back out, restoring the System Stack Pointer to its 
original value. Any Address register can be used in place of A7. In general, the 
time taken for a multiple Push is 8 + 8n~ and multiple Pull is 12 + 8n~, where 
n is the number of registers involved. Thus to Push a full register complement 
takes 132 clock cycles (16.5 /is at 8 MHz) against 224 clock cycles and 32 bytes 
of program memory using ordinary MOVEs. 

The MOVEM instruction uses a post-word to the op-code to indicate which regis¬ 
ters are involved, as shown in Fig. 14.11 If less than the full complement is involved, 
then the order of storage in the stack is still that shown in the register list. There 
is a word-sized MOVEM which only transfers the lower register words. This saves 
stack space and time; however, on return all registers — both Data and Address — 
are filled with the sign-extended long version of the stored word. 

Less usefully, a fixed address can be used as MOVEM's address mode instead 
of Pre-Decrement (registers to memory) or Post-Increment (memory to registers). 
In this case no pointer marks the bottom of the dump, and the same address is 
used for both directions. 

EXchanGe (EXG) swaps around the complete 32-bit contents of any two regis¬ 
ters, Data or Address. SWAP acts only on Data registers, and exchanges the lower 
and upper words. This is useful, for example, when using the Division opera¬ 
tion, which produces a 16-bit quotient in the lower part of a Data register and 
the remainder in the upper 16-bits. Using SWAP makes getting at the remainder 
easier (see Table l4~T2h The 68020 MPU has a byte-sized SWAP which exchanges 
the lower two bytes. The 68000 can use a R0L.W #8,Dn to perform the same 
function (see Table 14.31 . 

The 68000 provides for Addition, Subtraction, Multiplication and Division op¬ 
erations together with some ancillary instructions. The elementary Addition and 
Subtraction operations are straightforward, with at least one of the operands 
being a Data register, for example: 

ADD.B D0,1234h ; [1234] <- [D0(7:0)] + [1234h]. Add <Source> to <Destination> 

SUB.W 1234h,Dl ; [Dl(15:0)] <- [Dl(15:0)] - [1234:5h]. Sub <Source> from <Destination> 

ADD.L D0.D1 ; [Dl(31:0)] <- [Dl(31:0)] + [D0(31:0)]. Add <Source> to <Destination> 





MOVEMoL D0-D7/A0-A7, - (SP) (Push) 
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MOVEA.L #0F003,SP 



Absolute 

MOVEM.L 0EFFFh»D0-D7/A0-A7 
MOVEM.L D0-D7/A0-A7 » OEFFFh 


Order of transfer 


| DO | D11D21D31 D4| D51D6 |D7| A0| A11 A2| A3| A4j A5| A6| A7j 
Post-word for Pre-Decrement (Push) address mode 

Order of transfer 


13 


11 10 


|A7|A6|A5|A4|A3|A2|A11AO|D7|D6|D5|D4|D5|D2|D1|D0| 
Post-word for Pre-Decrement (Push) address mode 

(b) Register List Mask field. 


(a) Multiple Move interaction with memory. 


Figure 4.1 Multiple moves to and from memory. 
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Table 4.2 Arithmetic operations. 
Flags 


Operation 

Mnemonic 

X 

N 

z 

V c 

Description 

Add 

to Data reg. 

ADD.s3 ea,Dn 

V 

V 

V 

V 

V 

Add source to destination 
[Dn] <- [Dn] + [ea] 

to memory 

ADD.s3 Dn,ea 

V 

V 

V 

V 

V 

[ea] <- [ea] + [Dn] 

to Address reg. 

ADDA.s2 ea,An 

• 

• 

• 

• 

• 

[An] <- [An] + [ea] 

quick 

ADDQ.S3 1 #d 3 ,ea 

V 

V 

V 

V 

V 

[ea] <- [ea] + #d 3 2 

immediate 

ADDI.s3 #kk,ea 

V 

V 

V 

V 

V 

[ea] <- [ea] + #kk 

with extend 

ADDX.S3 Dy,Dx 

V 

V 

3 

V 

V 

[Dx] <- [Dx] + [Dy] + X 


ADDX.S3 -(Ay),-(Ax) 

V 

V 

3 

V 

V 

[-(Ax)] <- [-(Ax)] + [-(Ay)] + X 

Clear 

CLR.s3 ea 4 

• 

0 

0 

1 

0 

Clears destination 
[ea] <- 00 

Divide 

signed 

DIVS ea,Dn 

• 

V 

V 

V 

0 

Generates quotient and remainder (%) 
[Dn(15:0)] <-[Dn(31:0)]- [ea(15:0)] 

unsigned 

DIVU ea,Dn 

• 

V 

V 

V 

0 

[Dn(31:16)]<-[Dn(31:0)]% [ea(15:0)] 

Extend 

word 

EXT.W Dn 

• 

V 

V 

0 

0 

Sign Extend Data register 

[Dn(15:0)] <- [SEX| [Dn(7:0)]] 

long 

EXT.L Dn 

• 

V 

V 

0 

0 

[Dn(31:0)] <- [SEX|[Dn(15:0)]] 

Load Effective Ad 

dress 

LEA ea,An 

• 

• 

• 

• 

• 

Effective Address to Address reg. 

[An] <- ea 

Multiply 








signed 

MULS ea,Dn 

• 

V 

V 

0 

0 

[Dn(31:0)]<-[Dn(15:0)]x±[ea(15:0)] 

unsigned 

MULU ea,Dn 

• 

V 

V 

0 

0 

[Dn(31:0)]<-[Dn(15:0)]x [ea(15:0)] 

Negate 

data 

NEG.s3 ea 

V 

V 

V 

V 

V 

Reverses 2's complement sign 
[ea] <- 00 - [ea] 

with extend 

NEGX.s3 ea 

V 

V 

3 

V 

V 

[ea] <- 00 - [ea] - X 

Push Effective Ad 

dress 

PEA ea 

• 

• 

• 

• 

• 

Effective Address into Stack 
[-SP] <- ea 

Subtract 

from Data reg. 

SUB.s3 ea,Dn 

V 

V 

V 

V 

V 

Subtract source from destination 
[Dn] <- [Dn] - [ea] 

from memory 

SUB.s3 Dn,ea 

V 

V 

V 

V 

V 

[ea] <- [ea] - [Dn] 

from Addr. reg. 

SUBA.s2 ea,An 

• 

• 

• 

• 

• 

[An] <- [An] - [ea] 

quick 

SUBQ.S3 1 #d 3 ,ea 

V 

V 

V 

V 

V 

[ea] <- [ea] - #d 3 2 

immediate 

SUBI. s3 #kk,ea 

V 

V 

V 

V 

V 

[ea] <- [ea] - #kk 

with extend 

SUBX.s3 Dy,Dx 

V 

V 

3 

V 

V 

[Dx] <- [Dx] - [Dy] - X 


SUBX.s3 -(Ay),-(Ax) 

V 

V 

3 

V 

V 

[-(Ax)] <- [-(Ax)] - [-(Ay)] - X 


Note 1: Only Long and Word with Address register destination. Also CCR unchanged. 
Note 2: d 3 is a 3-bit number 1 to 8. 

Note 3: Cleared for non-zero, otherwise unchanged. 

Note 4: Not Address register. 
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In all cases the result is stored at the destination. Notice that in subtraction 
the , can be read as from. When the destination is in memory, then it must of 
course be alterable memory, usually RAM. Amongst the instructions, only MOVE 
can have both operands in memory. 

An Address register is not permitted as a destination, although legal as a 
source. Instead the special instructions ADDA and SUBA are used. As is usual, 
the CCR flags are not changed by any operation that alters an Address register, 
and only word and long-word sizes are permitted. Word results are always sign 
extended to a long-word. 

The ADD immediate Quick and SUB immediate Quick instructions are used 
as a substitute for the missing Increment and Decrement operations. A constant 
between 1 and 8 can be added or subtracted from any Data or Address register 
or read/write memory location, for example: 

ADDA.W #1,A0 ; [A0(31:0)] <- [A0(31:0)] + 1. Increment (12~) {Coded as DOFC-OOOlh} 

ADDQ.W #1,A0 ; [A0(31:0)] <- [A0(31:0)] + 1. Increment ( 8~) {Coded as 5248h} 

SUBQ.B #l,1234h ; [1234h] <- [1234] - 1. Decrement (16~) {Coded as 5338-1234h} 


The constant is encoded as a 3-bit group in the op-code itself. As can be seen 
above, this halves the size of the instruction and therefore decreases execution 
time. If an Address register is targeted, the usual word or long-word sizes are 
permitted, with the latter being sign extended to the whole 32 bits. The CCR flags 
remain unaltered. 

Notice that the last example above altered a memory location directly without 
using a Data register as an intermediary stop. The ADD Immediate and SUB 
Immediate instructions can be used where the data is greater than 8, for example: 

SUBI.W #500h,OCOOOh ; [C000:1] <- [C000:1] - 500h 


Where operands of greater than 32 bits are involved, then several sequential 
Adds or Subtracts may be used to form the multiple-precision sum or difference. 
In most processors the Carry flag provides the linkage between successive op¬ 
erations but, as noted on page ??, the X flag is used for this purpose in the 
68000 family. 

Figure 14.21 shows an example of a 96-bit addition made up of three 32-bit 
operations. The program for this is: 


MOVEA.L #OCOOCh,AO 
MOVEA.L #0C10Ch,Al 
ADD.L -(AO),-(Al) 

ADDX.L -(AO),-(Al) 

ADDX.L -(AO),-(Al) 


Point AO to just before least significant long-word <Source> 
and Al to just before least significant long-word destinations 
Add LSLWs, sum in <Destination> LSLW 
Add NSLWs, sum in <Destination> NSLW 
Add MSLWs, sum in <Destination> MSLW 


One main point to notice here is the use of the Pre-Increment Address Regis¬ 
ter Indirect address mode. As described in the next section, the Address register 
used to point to the operand (like an Index register) is automatically decremented 
by the appropriate number of bytes (by four here) before being used. With the 
arrangement of Fig. 14.21 the address will naturally creep towards the most sig¬ 
nificant bytes as we do each addition. This is the only memory targeted address 
mode that can be used by ADDX and SUBX to access data in memory. Alternatively 
both operands can lie in Data registers. 
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-(AO) 


-(AO) 

j—j 

Source T <> 


i 


cooo 


C001 C002 C003 C004 C005 C0Q6 C007 C008 CQ09 COOA COOB 



(A 1) + —(A1) A1 

I—I I—I 

! _ _ | 

i ; ^ ! ! 

Destination \y 

CIOO C101 Cl 02 Cl 03 Cl 04 C105 C106 C107 C108 C109 C10A C10B 

- 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 



MSLW NSLW LSLW 


= sum 


Figure 4.2 Multiple precision addition. 


Wouldn't it be useful if you could tell whether the whole multiple sum or 
difference was zero? A normal Add or Subtract will set the Z flag if the re¬ 
sult is zero otherwise it will clear it; thus the state of Z reflects the last addi¬ 
tion/subtraction. However, ADDX/SUBX does not affect the Z flag when the result 
is zero, otherwise the flag is cleared. Thus setting the Z flag (and also clearing 
the X flag) and using all ADDX or SUBXs will give a final Z setting of 1 only if all 
outcomes in the sequence are zero. Use: 

MOVE #00000100b, CCR ; Clears all flags, except Z = 1 
to set up this condition. 

An Address register cannot be zeroed using CLR; instead use a MOVEA #0, An or 
even SUBA An , An. NEGATE (NEC) is the normal 2's Complement operation (not on 
an Address register), but is rather unusually paired with a NEGATE with extend 
(NECX) instruction, which is used in a similar way to ADDX/SUBX for multiple- 
precision negations. 

The use of Load Effective Address (LEA) to move the result of a 6809 MPU's 
computed address into an Index register has been described in Sections 2.1 and 2.3 
In the 68000 MPU, the destination is any Address register and the similar Push 
Effective Address (PEA) inherently targets the System stack. We will discuss 
computed address modes in the next section, but some examples are: 
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LEA 8(SP),A0 ; [AO] <- [SP] + 8, Point AO to 8 bytes above SP 

LEA -200(PC),A1 ; [Al] <- [PC] - 200, Point A1 to 200 bytes below PC 

PEA 5(A0,D7.L) ; [[-SP] <- [AO] + [D7(31:0)] + 5, Push into Stack the 

contents of AO.L plus 32-bit contents of D7 plus 5 

The middle example illustrates the use of LEA in position independent code (see 
Sections 2.2 and 4.2). 

Signed and unsigned 16 x 16 multiplication is provided as a primitive. The 
Source can be anywhere in memory, a Data register or immediate data, whilst the 
destination must be a Data register, for example: 

MULU OCOOOh,DO ; [D0(31:0)] <- [00(15:0)] x [C000:1] 

MULS #-7,DO ; [D0(31:0)] <- [D0(15:0)] x -7 

MULU D1,D2 ; [D2(31:0)] <- [D2(15:0)] x [Dl(15:0)] 

The Division instructions are more complex. These are designed to divide a 
32-bit dividend by a 16-bit divisor, giving a 16-bit quotient in the lower word of 
the destination Data register and a 16-bit remainder in the upper word of the 
same register. The following code fragment shows how a dividend in D0.L is 
divided by 5000, with the quotient result placed in the the bottom of D6 and the 
remainder in the bottom of D7: 


DIVU #5000,DO 


CLR.L D6 
CLR.L D7 
MOVE.W DO,D6 
SWAP DO 
MOVE.W DO,D7 


Divide the destination by the source 

[D0(15:0)] <- [D0(31:0)] / 5000 (/ symbol is integer division) 

[DOC31:16)] <- [DOC31:0)] % 5000 (% symbol is integer remainder) 

Will hold the quotient 

Will hold the remainder 

16-bit quotient to D6.W 

16-bit remainder in lower DO 

to D7 


Preclearing D6.L and D7.L effectively promotes the word-moved unsigned 
quantities to 32 bits; it can be omitted if the upper 16 bits of these registers 
can be ignored. Alternatively, if DIVS is used, EXT can be utilized for a signed 
extension. Per mi tted operand address modes are the same as for MUL. 

As only 16 bits are reserved for the quotient and the dividend is 32 bits, it is 
possible that overflow will occur. This is especially likely with a small divisor. In 
such cases the V flag will be set. If the source should be zero, then a trap will 
occur, as described in Section 6.2. 

Four types o f Sh ift operation are available, each in a right and left version, as 
shown in Table [431 Any Shift operation can be targeted to a word in read/write 
memory or a Data register. The former is limited to a single shift, for example: 

LSR.W OCOOOh ; Logic Shift Right the contents of C000:1 one place 

Multiple shifts are possible if a Data register is targeted. Fixed shifts of 1 to 8 
places are specified as a 3-bit code embedded in the op-code (like ADDQ). Thus: 

LSR.L #4,DO ; Shift all bits in DO left 4 places 

Alternatively, the number of shifts can be specified dynamically by the lower five 
bits held in another Data register Dx[4:0], For instance: 
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Table 4.3 Shifting instructions. 


Flags 


Operation 

Mnemonic 

X 

N 

z 

V 

c 


Description 


Arithmetic Shift Right 








Linear Shift Right keeping the sign 


memory 

ASR.W 

ea 

bo 

V 

V 

1 

bo 




static Data reg. 

ASR.s3 

#d 3 ,Dn 

bo 

V 

V 

1 

bo 


J 

[x] 

dynamic Data reg. 

ASR.s3 

Dx, Dy 

bo 

V 

V 

1 

bo 


hi ]-[ 

c] 

Logic Shift Right 








Linear Shift Right 


memory 

LSR.W 

ea 

bo 

V 

V 

0 

bo 




static Data reg. 

LSR.s3 

#d 3 ,Dn 

bo 

V 

V 

0 

bo 



~xj 

dynamic Data reg. 

LSR.s3 

Dx, Dy 

bo 

V 

V 

0 

bo 

0 

—| |— 

T] 

Arithmetic Shift Left 








Linear Shift Left 


memory 

ASL.W 

ea 

bm 

V 

V 

1 

bm 




static Data reg. 

ASL.s3 

#d 3 ,Dn 

bm 

V 

V 

1 

bm 

E 



dynamic Data reg. 

ASL.s3 

Dx, Dy 

bm 

V 

V 

1 

bm 

\c_ 

hi |— 

0 

Logic Shift Left 2 








Linear Shift Left 


memory 

LSL.W 

ea 

bm 

V 

V 

0 

bm 




static Data reg. 

LSL.s3 

#d 3 ,Dn 

bm 

V 

V 

0 

bm 

E 



dynamic Data reg. 

LSL.s3 

Dx, Dy 

bm 

V 

V 

0 

bm 

c 

hi |— 

0 

ROtate Right 








Circular Shift Right 


memory 

ROR.W 

ea 

• 

V 

V 

0 

bo 




static Data reg. 

R0R.S3 

#d 3 ,Dn 

• 

V 

V 

0 

bo 


h <= \ 


dynamic Data reg. 

R0R.S3 

Dx, Dy 

• 

V 

V 

0 

bo 


1 1- 

■ c 

ROtate Left 








Circular Shift Left 


memory 

ROL.W 

ea 

• 

V 

V 

0 

bm 




static Data reg. 

ROL.S3 

#d 3 ,Dn 

• 

V 

V 

0 

bm 


/ => \ 


dynamic Data reg. 

ROL.S3 

Dx, Dy 

• 

V 

V 

0 

bm 

0 

hi 


ROtate Right with eXtem 

i 







Circular Shift Right through X 


memory 

ROXR.W 

ea 

bo 

V 

V 

0 

bo 




static Data reg. 

ROXR.s3 #d 3 ,Dn 

bo 

V 

V 

0 

bo 




dynamic Data reg. 

ROXR.s 3 Dx,Dy 

bo 

V 

V 

0 

bo 


1 1- 

■ c 

ROtate Left with extend 








Circular Shift Left through X 


memory 

ROXL.W 

ea 

bm 

V 

V 

0 

bm 




static Data reg. 

ROXL.S3 #d 3 ,Dn 

bm 

V 

V 

0 

bm 




dynamic Data reg. 

ROXL.S3 Dx,Dy 

bm 

V 

V 

0 

bm 

h 

4 - 



Note 1: Set IF most significant bit, b m , changes, ELSE cleared. 
Note 2: Identical with ASR except V flag cleared. 




MOVEQ #18,D7 
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[D7.L] <- 00000012h 
Sometime later 

Shift all bits in DO left by [D7[4:0]], i.e. 


LSR.L D7,DO 


18 


As well as being able to specify a shift number larger than eight, this type of 
specification has the advantage of variability, as it can be changed dynamically 
in software as conditions warrant, for example in a loop. 

The Logic Shift instructions simply shift in Os from the left or right as appro¬ 
priate, with the emerging bit being caught by flags C and X. Arithmetic Shift 
Left and Logic Shift Left are the same, except that the V flag is set if the MSbit 
changes. If the operand is a signed number, this would signal a sign change, for 
instance 0,1 0011110 — 1,00111 00. In the case of Arithmetic Shift Right, 
the sign bit propagates right; thus 7,11101 00 b (-12) becomes 7, /11 1 01 0b (-6) 
becomes 7,7 7111 01 b (-3) etc. and 0,0001 1 00 b (+12) becomes 0,00001 1 0b (+6) 
becomes 0,000001 1 b (+3) etc. 

ROtate through the extend instructions (R0XL, R0XR) are similar to ADD with 
eXtend, in that they can be used for multiple-precision operations. A ROtate 
through eXtend takes in the X flag from any previous Shift and in turn saves 
its ejected bit in X. As an example, a 48-bit number stored as three consecu¬ 
tive 16-bit words in memory 


47 


M 


32 


be shifted once right as follows 151: 


31 


M+2 


16 


15 


M+4 


can 


LSR 

M 

; o - 

=> 

M 

b 3 2 - 

X 

R0XR 

M+2 

; b 32 /[x] — 

=> 

M+2 

b 16 — • 

X 

R0XR 

M+4 

; b 16 /[x] - 

=> 

M+4 

b 0 - 

X 


True circular Rotates are provided, where the shift is not through a flag (al¬ 
though the C flag still catches the emerging bit). This emerging bit is copied into 
the other end of the operand word. Thus: 


R0R.W #8,DO ; [D0(15:8)] <- [00(7:0)], [D0(7:0)] <- [D0(15:8)] 


moves the lower byte of DO up eight places and the next higher byte around to be 
the new lower byte. This is the equivalent of SWAP. W DO (only SWAP. L is available, 
except in the 68020 MPU and up). 

The three binary logic op erati ons AND, OR, Exclusive-OR (EOR) and NOT are 
provided, as shown in Table l4~4l The first two can bitwise operate on any Data 
register or alterable memory location. EOR (rather inconsistently) can only use a 
Data register as target. All three have an Immediate variant that can target an 
alterable memory location directly or be used to change any bit or bits in the CCR 
or SR (the latter only in the Supervisor state), for example: 

ANDI.B #11111110b,CCR ; Clear Carry flag, others unchanged 

NOT is a single-operand instruction that inverts all 8, 16 or 32 bits in either a 
Data register or alterable memory. Some assemblers use COM (COMPLEMENT) as 
the mnemonic for this instruction. 
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Table 4.4 Logic Instructions. 

Flags 


Operation 

Mnemonic 

X 

N 

z 

V 

c 

Description 

AND 








Logic bitwise AND 

to Data register 

AND.sB 

ea, Dn 

• 

V 

V 

0 

0 

[Dn] <- [Dn] ■ [ea] 

to memory 

AND.sB 

Dn, ea 

• 

V 

V 

0 

0 

[ea] <- [ea] ■ [Dn] 

immediate 

ANDI.S3 

#kk,ea 1 

• 

V 

V 

0 

o 2 

[ea] <- [ea] ■ #kk 

EOR 








Logic bitwise EXclusive-OR 

to Data register 

E0R.S3 

ea, Dn 

• 

V 

V 

0 

0 

[Dn] <- [Dn] © [ea] 

immediate 

E0RI.S3 

#kk,ea 1 

• 

V 

V 

0 

o 2 

[ea] <- [ea] © #kk 

NOT 

NOT.s 3 

ea 

• 

V 

V 

0 

0 

[ea] <- [ea] 

OR 








Logic bitwise OR 

to Data register 

OR. s3 

ea, Dn 

• 

V 

V 

0 

0 

[Dn] <- [Dn] + [ea] 

to memory 

OR. s3 

Dn, ea 

• 

V 

V 

0 

0 

[ea] <- [ea] + [Dn] 

immediate 

0RI.S3 

#kk,ea 1 

• 

V 

V 

0 

o 2 

[ea] <- [ea] + #kk 


Note 1: Any alterable memory location, Data register, CCR or SR (privileged). 
Note 2: With destination CCR or SR, all flags altered accordingly. 


Being able to get at individual bits of an operand directly is considered impor¬ 
tant for microcontrollers GD, but rather unusual in 16/32-bit MPUs. The 68000 
MPU has four such instructions, listed in Table PPl which can clear, set or toggle 
any bit in a byte of alterable memory, or any of the 32 bits in a Data register. The 
bit number may be defined as a static i mm ediate operand or dynamically held 
in another Data register (like the Shift instructions). All three instructions also 
affect the Z flag giving the state of the targeted bit before the operation. 

The final instruction BTST does not alter the bit in question, but the Z flag still 
ends up reflecting its state; thus the code fragment: 

LOOP: BTST #6,08080h ; How is the state of bit 6 in location 8080h? 

BEQ LOOP ; If it is still zero try again 

circulates in a tight loop waiting for bit 6 of memory location 8080 h to change 
to logic 1. This may be the Control register of a PIA, and thus effectively the 
program will be waiting for the active edge of handshake line CA2 (programmed 
as an input) to occur. Of course if that event never occurs, due to a hardware 
fault, then the system will hang up indefinitely. More about that later. 

Strictly speaking BTST should be classified as a Data testing instruction, its 
purpose being not to change the operand but to sense its state, which is reflected 
in the Z flag to be used later by a Conditional Branch. The two other such instruc¬ 
tions are CoMPare (CMP) and TeST (TST), as shown in Table [TGl A CoMPare does 
a subtraction of the source operand from the destination operand (as does SUB), 
setting the flags accordingly but not putting the difference into the destination. 
A TeST for zero or negative is just a CoMPare with a zero source operand 
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Table 4.5 Bit-level instructions. 

Flags 


Operation 

Mnemonic 

X 

N 

Z 

V C 

Description 

Bit Test and Change 







Z = b n . Toggle bit n 

dynamic 

BCHC Dx,ea 1 

• 

• 

b n 

• 

• 

b[Dx] <- b[Dx] 

static 

BCHC #kk,ea 1 

• 

• 

bn 

• 

• 

b#kk <_ b^k 

Bit Test and Clear 







Z = b n . Clear bit n 

dynamic 

BCLR Dx,ea 1 

• 

• 

b n 

• 

• 

b[Dx] <- 0 

static 

BCLR #kk,ea 1 

• 

• 

bn 

• 

• 

b#kk <- 0 

Bit Test and Set 







Z = b n . Set bit n 

dynamic 

BSET Dx,ea 1 

• 

• 

b n 

• 

• 

r D" 

O 

A 

1 

I— 1 

static 

BSET #kk, ea 1 

• 

• 

bn 

• 

• 

b#kk <- 1 

Bit Test 







Z = b n . Test bit n 

dynamic 

BTST Dx,ea 1 

• 

• 

b n 

• 

• 

No change except in Z 

static 

BTST #kk, ea 1 

• 

• 

bn 

• 

• 

No change except in Z 


Note 1: Size is Byte if ea is out in memory, else Long if a Data register. 


(i.e. TST DO is the same as CMP #0, DO). 

There are four varieties of CoMPare available. The plain vanilla' CMP can 
use any memory contents, immediate data, Data register or Address register as 
source to be compared with a Data register, for example: 

CMP.W #56,DO ; Compare [D0(15:0)] with the number 56, [D0(15:0)]-56 

CMP.B 123h,Dl ; Compare [Dl(7:0)] with the contents of 123h, [Dl(7:0)]-[ 123h] 

CMP.L AO,D2 ; Compare [D2(31:0)] with [A0(31:0)], [D2(31:0)]-[A0(31:0)] 

Notice the comparison is destination with source, just as SUB is subtract source 
from destination. Some processor assemblers, such as for the PDP-lf minicom¬ 
puter and 80x86 family MPUs, reverse the order. 

CM PA is used with Address register destinations. Unlike other such targeted 
instructions (e.g. ADDA), the CCR flags are set normally, but with word-length 
source operands sign extended in the usual way to a long-word, for example: 

CMPA.W #8000h,A0 ; [A0(31:0)] is compared with FFFF8000h (-32,768) 
CMPA.L 1234h,Al ; Compare [Al(31:0)] with [1234:5:6:7] 

CMPA.L D0,A2 ; Compare [A2(31:0)] with [D0(31:0)] 

An immediate quantity can be compared to any alterable memory or Data 
register by using CMPI, for example: 

CMPI.B #64,1234h ; Compare [1234h] with 64 

Memory can be directly compared to memory with a CoMPare Memory (CMPM). 
In this case only the Post-Increment address mode is available, as CMPM is pri¬ 
marily designed as a Block-Compare primitive. For instance, the following code 
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Table 4.6 Data testing instructions. 

Flags 


Operation 

Mnemonic 

X 

N 

z 

V 

c 

Description 

Compare 

Data reg. with 

CMP.S3 1 ea,Dx 

• 

V 

V 

V 

V 

Non-destructive [destn] - [source] 
[Dx] - [ea] 

Addr. reg. with 

CMPA.S2 ea,Ax 

• 

V 

V 

V 

V 

[Ax] - [ea] 

Mem. with const. 

CMPI.S3 #kk,ea 

• 

V 

V 

V 

V 

[ea] - #kk 

Mem. with mem. 

CMPM.s3 (Ay)+,(Ax)+ 

• 

V 

V 

V 

V 

1—1 

i—i 
> 
X 

1_1 

+ 

1_1 

1 

1—1 
> 

1_1 

+ 

1_1 

Test for Zero or Minu 

s 

TST.s3 ea 2 

• 

V 

V 

0 

0 

Non-destructive [destination] - 0 
[ea]-00 


Note 1: Only Word and Long if source is Address register. 

Note 2: Only alterable memory and Data register, not Address register. 


fragment exits with the address+1 of the first pair of bytes which differ in two 
blocks of data or strings: 


MOVEA.L #BL0CK_1,AO 
MOVEA.L #BL0CK_2,A1 
CL00P: CMPM.B (A0)+,(A1)+ 
BEQ CL00P 


Point AO to bottom of Block 1 

Point A1 to bottom of Block 2 

Compare bytes and move each pointer on one 

IF same THEN next 

ELSE continue 


The TeST primitive is represented by the TST instruction. This can check 
that the contents of any memory location or Data register is zero (sets Z flag) or 
negative (sets N flag), for example: 


TST.B 1234h ; Test contents of 1234h for zero or negative 

TST.W DO ; Test lower 16 bits of DO for zero or negative 

The Block-Test code fragment above followed the Comparison operation by 
the Conditional Branch (BEQ). Branch instructions add an offset to the Program 
Counter if the condition is True (Z = 1 in the example) otherwise its state remains 
pointing to the following instruction. There is also an Unconditional Branch, BRA, 
which always adds the offset. Two sizes of Branches are available, Short (or byte) 
which carries an 8-bit signed offset as part of the op-code, and Word, where a 
16-bit signed offset follows the op-code word. The 68020 allows a Long Branch. 

There are 14 combinations of the C, Z, N and V flags which can be used as 
a test for a Conditional Branch. The X flag is reserved exclusively for multiple- 
precision arithmetic and does not take part in this exercise. With the exception of 
the somewhat useless BRanch Never (BRN), all 6809 Conditional Branches listed 
in Table I2T>1 are also available to the 68000 family. The mathematical significance 
of the various flag combinations are given on page [28] and will not be repeated 
here. In Table 14.71 these tests are listed as 4-bit code combinations (cc). All 
Branch op-codes start with 011 Oh followed by the cc code, followed on by the 
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Table 4.7 Instructions which affect the Program Counter. 


Operation 

Mnemonic 

Description 

Unconditional Program Transfer 

Always goto 

Branch to Label 

BRA Offset 1 

Offset always added to PC, relative goto 

Jump to Label 

JMP ea 

[PC] <- ea, absolute goto 

Conditional Program Tran 

sfer 

Goto IF condition is True 

Branch to Label 

Bcc 2 Offset 

Offset added onto PC IF condition is met 

Test, Decrement & Branch 

DBcc 2 Dx,Offset 

Repeat loop until any condition is met 

No Operation 

NOP 

IF condition is True THEN exit loop 

ELSE 

[Dx(15:0)] <- [Dx(15:0)] - 1 
IF[Dx(15:0)] = -1 is True THEN exit loop 
ELSE 

[PC] <- [PC] + Offset (continue loop) 

Does nothing except increment PC by 2 
[PC] <- [PC] + 2, takes 4~ 


Note 1: Normally a label is specified here and the assembler works out the offset. 
Note 2: The condition codes (cc) are: 


True on True on 


0000 

T 3 

True always 

Always 

1000 

VC 

overflow Clear 

V = 0 

0001 

F 3 

False always 

Never 

1001 

VS 

overflow Set 

V = 1 

0010 

HI 

Higher than 

C+Z = 0 

1010 

PL 

PLus 

N = 0 

001 1 

LS 

Lower or Same 

C+Z = 1 

1011 

MI 

Minus 

N = 1 

0100 

CC 

Carry Clear 

C = 0 

1100 

CE 

Greater or Equal 

N®V = 0 

0101 

CS 

Carry Set 

C = 1 

1 101 

LT 

Less Than 

N®V = 1 

0110 

NE 

Not Equal 

Z = 0 

1110 

CT 

Greater Than 

N®V-Z = 1 

011 1 

EQ 

EQual 

Z = 1 

1 1 11 

LE 

Less or Equal 

N®V-Z = 0 


Note 3: Only for DBcc. 


the 8-bit displacement if Short or all zeros if Word. In the latter case the 16- 
bit displacement follows the op-code. Thus the instruction BPL .06 (Branch if 
PLUS six places on) is coded as 01 10-101 0-000001 1 0b (6A06 h). 

In the 68000 family the cc tests can be used with other instructions, the most 
useful of which are the Decrement, Test and Branch loop operations. We have 
already used software loops, for example the Block-Compare routine on page!99l 
Essentially a loop is a mechanism in which a section of code can either be repeated 
a fixed number of times (the loop count) or exit when a certain condition or 
conditions are fulfilled, or both. 

As an example of the latter situation, consider interfacing to a peripheral which 
sets bit 6 of an interface device's Control register (e.g. a 6821 PIA) when it has 
valid data it wishes to be read. This involves continually checking the state of 
bit 6 in a loop until it goes high; only then do we move on and read the data. 
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EXIT: 


Figure 4.3 Using DBcc to implement a loop structure. 


REPEAT only IF 

both conditions are FALSE 
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But what happens if, say, due to hardware malfunction, this Data Ready signal is 
never sent? The software will then hang indefinitely. Perhaps it would be better 
to give up after a fixed number of times and go to an error routine if this sequence 
of events happens. To do this we would have to check the flag; if it is not set, 
then decrement the loop count, and if this hasn't fallen through zero (i.e. to -1) 
then repeat. Following the structure of Fig. l4.3l a possible coding is: 



MOVE.W 

#n,Dl 

Set loop count n 


LOOP: 

BTST 

#6,CONTROL 

Test bit 6 of the Control register 

16 


BNE 

EXIT 

IF True THEN EXIT (cc=Not Equal Zero) 

12 


SUBQ.W 

#1, D1 

ELSE decrement loop count 

4 


BCC 

LOOP 

IF no Carry then [Dl] is not -1 

18 

EXIT: 

CMP 

#-l,Dl 

Exit with n = -1? 



BEQ 

ERROR 

IF True THEN error 



MOVE.B 

PORT,DO 

ELSE read data from port 





and continue 


The alternative combines the decrement and two tests thus: 



MOVE.W 

#n,Dl 

Set loop count n (max 65535) 


LOOP: 

BTST 

#6,CONTROL 

Test bit 6 of the Control register 

16 


DBNE 

Dl,LOOP 

Decrement and repeat loop until True 

18 

; Pass here either IF True that bit 6 is 1 OR True that n = -1 


EXIT: 

CMP 

#-l,Dl 

Exit with n = -1? 



BEQ 

ERROR 

IF True THEN Error 



MOVE.B 

PORT,DO 

ELSE read data from port 





and continue 



For applications where speed is important (not this example) reducing the time 
taken by the control mechanism is important, as this housekeeping overhead is 
executed on each pass through the loop body. In this case the Test and Control 
is 34~ as against 50~. Notice that BNE is shown with an execution time of 12~, 
whilst BCC is 18~. This is because Branches taken (i.e. True) for byte offsets take 
longer than Branches not taken (but the opposite for word offsets, 18~ and 20~!). 
Similarly DBcc has a variable execution time. As the number n used by DBcc 
is limited to 65,536, the ordinary Branch construction must be used where the 
default timeout parameter exceeds this number. 

Some situations require the number of loop passes to be fixed. As the normal 
DBcc exits if either test is True, the variant DBF makes the first test always False, 
and so an exit only happens when the loop count reaches -1. The routine below, 
which is a fixed delay using an idle loop body, shows this: 


DELAY: 

MOVE.W 

#n, DO 

; n is the 

del ay 

parameter 

16~ 

LOOP: 

NOP 


; Do nothi 

ng and 

take 

4~ 


DBF 

DO,LOOP 

; one less 

pass 


18~ 


The total delay here is 16 + (n + 1) x 22 (+8 extra when DBF is True), a total of 
46 + 22 n clock cycles. Thus a 0.1 s delay requires 46 + 22 n = 8 x 10 5 ps at a 
clock rate of 8 MHz, giving n = 36, 363. Remember that n has a ma xi mum value 
of 65,536 for DBF. 
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Using the DBcc construct, the number of loop passes is n + 1, where n is 
the word-sized number preloaded into a Data register. It is possible initially to 
enter the loop directly into the control mechanism, as shown dashed in Fig. 14.31 
in which case the number of passes is just n. In this case no passes through 
the loop body will occur if n = 0 or the test is True (WHILE-DO loop construct) 
whereas the former situation always includes one pass irrespective (DO-WHILE 
loop construct). An example of this is shown in Table l5~4l 

The DBcc instruction can be confusing because it operates in the opposite 
sense to the analagous Bcc. Thus BEQ LOOP causes control to be passed to LOOP 
if the conditional test outcome is True (i.e. Z = 0). The si mi lar DBEQ LOOP does 
not transfer control to LOOP if the outcome is True, that is the processor escapes 
from the loop. Using the terminology Decrement and Branch until True' as 
opposed to ' Branch if True’ may help clarify the situation. Table R~8l summariz.es 
the complete 68000/8 instruction set. For each instruction, its operand size is 
given and allowable address modes are given. Finally, its effect on the five flags 
in the Code Condition register is tabulated. 
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Table R~8l Summary of 68000 instructions (continued next page). 


Instruction 

Size 

Address modes for [ea] 

Flags 



# Dn An (An) 

(An) + 

-(An) ±dj6(An) ±dg(An,Ri) 

A.L 

A.W ±di 6 (PC) ±d 8 (PC,Ri) 

X N Z V C 

ABCD Dx,Dy 

B 













V U V u v 

ABCD -(Ax),-(Ay) 

B 













yj u V U yj 

ADD [ea],Dx 

BWL 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

v'v'/v'V 

ADD Dx,[ea] 

BWL 




* 

* 

* 

* 

* 

* 

* 



yf yj yJ V V 

ADDA [ea],Ax 

WL 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 


ADDI H, [ea] 

BWL 


* 


* 

* 

* 

* 

* 

* 

* 



yj yj yj yj yj 

ADDQ #K 3 ,[ea] 

BWL 


* 

* 

* 

* 

* 

* 

* 

* 

* 



yj yj yj yj yj 

ADDX Dx,Dy 

BWL 













yj yj yj yj yj 

ADDX -(Ax),-(Ay) 

BWL 













yj yJ yf V V 

AMD [ea],Dx 

BWL 

* 

* 


* 

* 

* 

* 

* 

* 

* 

* 

* 

• yj yj 0 0 

AND Dx,[ea] 

BWL 




* 

* 

* 

* 

* 

* 

* 



• yj yj 0 0 

ANDI #K,[ea] 

BWL 


* 


* 

* 

* 

* 

* 

* 

* 



• yj yj 0 0 

ANDI #K,CCR 

B 













yj yj yj yj yj 

ANDI M,SR P 

W 













yj yj yj yj j 

ASL/R Dx,Dy 

BWL 













yj yj yj yj -j 

ASL/R #d 3 ,Dx 

BWL 













yj yj yj yj yj 

ASL/R [ea] 

W 




* 

* 

* 

* 

* 

* 

* 



yj yj yj yj yj 

Bcc [label] 

BW 













* • • • • 

BRA [label] 

BW 













• * * • • 

BSR [label] 

BW 













• * * • • 

BCHG Dx,[ea] 

BL 


L 


B 

B 

B 

B 

B 

B 

B 



• • yj • • 

BCHG «, [ea] 

BL 


L 


B 

B 

B 

B 

B 

B 

B 



• • yj • • 

BCLR Dx, [ea] 

BL 


L 


B 

B 

B 

B 

B 

B 

B 



• * yj * * 

BCLR #K,[ea] 

BL 


L 


B 

B 

B 

B 

B 

B 

B 



* * yj * * 

BSET Dx,[ea] 

BL 


L 


B 

B 

B 

B 

B 

B 

B 



• • yj • • 

BSET #K,[ea] 

BL 


L 


B 

B 

B 

B 

B 

B 

B 



• • yj • • 

BTST #K, [ea] 

BL 


L 


B 

B 

B 

B 

B 

B 

B 



• • yj • • 

BTST Dx,[ea] 

BL 


L 


B 

B 

B 

B 

B 

B 

B 



• * yj • * 

CHK [ea],Dx 

W 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

• y u u u 

CLR [ea] 

BWL 


* 


* 

* 

* 

* 

* 

* 

* 



• 0 1 0 0 

CWP [ea],Dx 

BWL 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

• yj yj yj yj 

CMPA [ea],Ax 

WL 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

• yj yj yj yj 

CMPI #K,Dx 

BWL 


* 


* 

* 

* 

* 

* 

* 

* 



• y/ y/ yj yj 

CMPM (Ax)+,(Ay)+ 

BWL 













• yj yj yj V 

DBcc Dx, [label] 

W 














DIVS [ea],Dx 

W 

* 

* 


* 

* 

* 

* 

* 

* 

* 

* 

* 

* V V V ® 

DIVU [ea],Dx 

W 

* 

* 


* 

* 

* 

* 

* 

* 

* 

* 

* 

• yj yj yj ° 

EOR Dx,[ea] 

BWL 


* 


* 

* 

* 

* 

* 

* 

* 



• yj yj 0 0 

EDRI #K,[ea] 

BWL 


* 


* 

* 

* 

* 

* 

* 

* 



• yj yj 0 0 

EORI #K,CCR 

W 













V V V \/ V 

EDRI #K,SR P 

W 













yj yj yj yj yj 

EXG Dx,Dy 

L 














EXT Dx 

WL 













• yj yj 0 0 

ILLEGAL 














• • • • • 

JMP [ea] 





* 



* 

* 

* 

* 

* 

* 

• • • • • 

JSR [ea] 





* 



* 

* 

* 

* 

* 

* 

• • • • • 

LEA [ea],Ax 

L 




* 



* 

* 

* 

* 

* 

* 

• * * • • 

LINK Ax,#K 














• * * • • 

LSL/R Dx,Dy 

BWL 













a/ V V o V 

LSL/R #d 3 ,Dx 

BWL 













yj yj yj 0 yj 

LSL/R [ea] 

W 




* 

* 

* 

* 

* 

* 

* 



yj yj y/ ° yj 


yj 

Flag operates in the normal manner. ♦ 

Not affected. 

u 

Undefined. 

P 

Privileged. S 

Source only. 

* 

Available. 

dn 

n-bit displacement. ^K m 

m-bit immediate number. 

± 

Sign extended 
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Table 4.8 (continued) Summary of 68000 instructions. 


Instruction 

Size 

Address modes for [ea] 

Flags 



# 

Dn 

An (An) 

(An) + 

-(An) 

±di 6 (An) 

±d 8 (An,Ri) 

A.L 

A.W ±d 16 (PC) ±dg(PC,Ri) 

X N Z V C 

HOVE [ea] , [ea] 

BWL 


* 

* s 

* 

* 

* 

* 

* 

* 

* 

* s 

# 

• y/ y/ 0 0 

HOVE [ea],CCR 

w 

* 

* 


* 

* 

* 

* 

* 

* 

* 

* 

* 

V V V V V 

HOVE SR,[ea] 

w 

* 

* 


* 

* 

* 

* 

* 

* 

* 




HOVE [ea] ,SR P 

w 


* 


* 

* 

* 

* 

* 

* 

* 



a/ a/ a/ a/ a/ 

HOVE USP,Ax p 

L 














HOVEA Ax,«SP p 

L 














HOVEA [ea],Ax 

WL 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 




HOVEH [ER] , [ea] 

WL 




* 


* 

* 

* 

* 

* 




HOVEH [ea],[ER] 

WL 




* 

* 


* 

* 

* 

* 

* 

* 


HOVEP Dx,±dig(Ay) 

WL 














HOVEP ±di 6 (Ay) ,Dx 

WL 














MOVEQ #±K 8 ,Dn 

L 













• « 

HULS [ea] ,Dx 

W 

* 

* 


* 

* 

* 

* 

* 

* 

* 


* 

• a/ a/ 0 0 

HULU [ea],Dx 

w 

* 

* 


* 

* 

* 

* 

* 

* 

* 


* 

• yf yj 0 0 

SBCD [ea] 

B 


* 


* 

* 

* 

* 

* 

* 

* 



sj U s/ u v / 

MEG [ea] 

BWL 


* 


* 

* 

* 

* 

* 

* 

* 



a/ a/ a/ a/ a/ 

SEGX [ea] 

BWL 


* 


* 

* 

* 

* 

* 

* 

* 



a/ a/ a/ a/ a/ 

WOP 















HOT [ea] 

BWL 


* 


* 

* 

* 

* 

* 

* 

* 



• a/ a/ 0 0 

OR [ea] ,Dx 

BWL 

* 

* 


* 

* 

* 

* 

* 

* 

* 


* 

• yj yj 0 0 

OR Dx, [ea] 

BWL 




* 

* 

* 

* 

* 

* 

* 



• * yj 0 0 

ORI #X,[ea] 

BWL 


* 


* 

* 

* 

* 

* 

* 

* 



• * yj 0 0 

ORI #K,CCR 

B 













yj V V V V 

ORI #K,SR P 

W 













V V v/ v/ x/ 

PEA [ea] 

L 




* 



* 

* 

* 

* 




RESET P 















ROL/R Dx,Dy 

BWL 













• a/ a/ ^ a/ 

ROL/R #d 3 ,Dx 

BWL 













• yj yj 0 yj 

ROL/R [ea] 

w 




* 

* 

* 

* 

* 

* 

* 



• yj yj 0 yj 

ROXL/R Dx,Dy 

BWL 













a/ yj yj o yj 

ROXL/R #d 3 ,Dx 

BWL 













yj yj yj 0 yj 

ROXL/R [ea] 

w 




* 

* 

* 

* 

* 

* 

* 



yj yj yj 0 yj 

RTE P 














a/ a/ a/ a/ a/ 

RTR P 














v/ v/ v 7 v/ V- 

RTS 















SBCD Dx,Dy 

B 













v/ U y U y 

SBCD -(Ax),-(Ay) 

B 













sj U y u v / 

See [ea] 

B 


* 


* 

* 

* 

* 

* 

* 

* 




STOP P 














a/ a/ a/ a/ a/ 

SUB [ea],Dx 

BWL 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

a/ a/ a/ a/ a/ 

SUB Dx, [ea] 

BWL 




* 

* 

* 

* 

* 

* 

* 



a/ a/ a/ a/ a/ 

SUBA [ea],Ax 

WL 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 


SUBI #K,[ea] 

BWL 


* 


* 

* 

* 

* 

* 

* 

* 



a/ a/ a/ a/ a/ 

SUBQ #K 3 , [ea] 

BWL 


* 

* 

* 

* 

* 

* 

* 

* 

* 



a/ a/ a/ a/ a/ 

SUBX Dx,Dy 

BWL 













a/ a/ a/ a/ a/ 

SUBX -(Ax),-(Ay) 

BWL 













a/ a/ a/ a/ a/ 

SWAP Dx 

W 













• a/ a/ 0 0 

TAS [ea] 

B 


* 


* 

* 

* 

* 

* 

* 

* 



• yj yj 0 0 

TRAP #K4 















TRAPV 















TST [ea] 

BWL 


* 


* 

* 

* 

* 

* 

* 

* 



• yj yj 0 0 

UNLK Ax 
















: Flag operates in the normal manner. 

• 

Not affected. 

u 

Undefined. 

: Privileged. 

.S' 

Source only. 

* 

Available. 

: n-bit displacement. 

#Km 

m-bit immediate number. 

± 

Sign extended 
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4.2 Address Modes 


Except for the few inherent operations which do not require data, such as ReTurn 
from Subroutine (RTS), some part of the instruction must be used to specify 
where or how to calculate the whereabouts of the operand(s). Broadly there are 
three methods of specifying an effective address (ea): 

1. Constant (fixed) data: Immediate. Here the data is part of the instruction and 
usually follows the op-code. Some instructions have quick varieties, such as 
ADDQ, which embed small immediate numbers (e.g. 1 to 8) in the op-code itself. 

2. Fixed location: Absolute memory or Register direct. The fixed memory ad¬ 
dress follows the op-code, or a register is specified as part of the op-code. 

3. Variable location: Address register or Program Counter register Indirect with 
optional fixed and variable offsets, where a register points to the operand. As 
such register contents can be changed in software at run time, the effective 
address is a variable. 


The use of the more complex address modes of category 3 are important in 
high-level language where data is often allocated space relative to a Stack Pointer 
rather than in absolute addresses. Computed addresses are also useful in access¬ 
ing data structures such as arrays and in producing position independent code, 
see pagel40l 

As an illustrated example, consider the problem of clearing an array of 1024 bytes 
located between E000 and E03F/i. Using only Absolute addressing, the routine 
would look something like this: 


CLEAR_ARR: CLR.B OEOOOh 

CLR.B OEOOlh 
CLR.B 0E002h 
CLR.B 0E003h 
CLR.B 0E004h 
CLR.B 0E005h 


Clear ARRAY[0] 
and ARRAY[1] 

each CLR occupies 6 bytes 
of program memory 
and takes 20 clock cycles 
Keep on going 


CLR.B 0E3FFh ; Clear ARRAY[1023]. Phew! 

This routine occupies 6144 bytes of program memory and takes 20,480 clock 
cycles (2 5 60 ps at 8 MHz) to execute. 

As we need to repeat the same operation 1024 times, clearly we have a prime 
candidate for using a loop construction, thus: 

CLEAR_ARR: M0VEA.L #0E000h,A0 ; Point A0 to ARRAY[0] 

M0VE.W #1023,DO ; Set up loop count less 1 in D0.W 

CL00P: CLR.B (A0)+ ; While [D0.W] > -1 

; Clear Array element pointed to by A0 and move pointer on one byte 

DBF DO,CLOOP ; Decrement loop count, exit on D0.W = -1 

This routine occupies 6 + 4 + 2 + 4 = 16 bytes of program memory and takes 
4 + 4 + (12 x 1024) + (10 x 1023) + 14 = 22, 540 clock cycles (2817.5 ps at 8 MHz). 
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In the first two instructions Immediate addressing is used to place constants. 
The loop body uses Address Register Indirect with Post-Increment addressing 
to walk through the array. Address register_A0 holds the address of the array 
element, and, after that address has been put out on the bus, is automatically 
incremented. Although the execution time of this address mode is shorter than 
for Absolute, as the address does not have to be fetched after the op-code, this 
is more than made up for by the overhead of the loop control DBF instruction, 
which takes 10~ when the loop is re-entered and 14~ for the final exit. Thus the 
quid pro quo for the reduction of program memory by a factor of 38,400% is an 
increase in execution time of around 10%. 

The rest of this section looks at the available address modes. Sizes are given 
for single-operand instructions, double-operands may require additional exten¬ 
sion words. 

Inherent 


op-code 


Inherent instructions make implicit reference to a register or registers. Thus 
RTS implies the use of the SSP and PC registers. The Branch instructions are 
sometimes listed under this category, implying the PC register; however, they 
can also be thought of as using a type of Program Counter with Displacement 
address mode. 

Immediate, #kl< 


op-code 



op-code 

constant 


op-code 

constant 


3 or ± 8-bit (Quick) 
8/16-bit (. B or . W) 
32-bit (.L) 


Here the operand is the data itself, not an address or pointer to an address. Gen¬ 
erally the constant follows the op-code as one or two words. Three instructions 
have Quick-Immediate variants where the data is embedded in the op-code itself, 
MOVEQ reserves 8 bits for the signed constant (+127 to -128) and ADDQ/SUBQ 
can only be used for unsigned 3-bit constants 1 to 8 (000 b represents 8 here). 
The instruction variants ADDI/SUBI permit constants of any applicable size to be 
added or subtracted directly on alterable memory locations, rather than on Data 
registers. Some examples are: 


ADD.L #1,D0 
ADDQ.L #1,D0 
ADDQ.W #l,0E000h 
ADDI.W #56h,OEOOOh 


[D0(31:0)]<-[D0(31:0)]+l (16~) 
[D0(31:0)]<-[D0(31:0)]+l ( 8~) 
[E000:1] <-[E000/1] +1 (20~) 
[E000:1] <-[E000/1] +56h (24~) 


{Coded D0BC-0000-0001h} 
{Coded 5280h} 

{Coded 5279-0000-E000h} 
{Coded 0679-0056-0000-E000} 


Notice the difference in size and execution time between the top two examples, 
which do the same thing. Of course ADDQ is limited to operand sizes of up to only 
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eight. The difference between ADDQ and ADDI for alterable memory destinations 
is not so great, but still significant. 

Direct or Absolute modes 

Three submodes are available which specify that the operand is in either a Data 
register, Address register or in absolute memory. 

Data Register Direct, Dn 


op-code 


The vast majority of instructions use a Data register as the destination, source or 
both — as listed in Table 14781 The op-code itself holds the register number(s) (see 
riii. l4.4i , so instructions using this address mode are short and also execute faster. 
Thus, where convenient, variables should be kept in a register. The first two 
examples under the Immediate heading also used Data Register Direct addressing 
as the destination; some other possibilities are: 

ADD.L DO,D1 ; [Dl(31:0)] <- [Dl(31:0)] + [D0(31:0)] {Coded as D280h} 

ADD.B D1,0E000h; [EOOO] <- [E000] + [Dl(31:0)] {Coded as D339-OOOO-EOO0h} 

Address Register Direct, An 


op-code 


Addresses stored in an Address register can point to data for most instructions, 
but only the special instructions ADDA, SUBA and MOVEA can also target and hence 
change these pointers. The ADDQ and SUBQ variants can also target any Address 
register in . W or . L sizes. They are useful to increment or decrement pointers. 
Some examples are: 

ADD.L AO,DO ; [00(31:0)] <- [D0(31:0)]+[A0(31:0)] {Coded as D188h} 

ADDA.W #8000h,Al ; [Al(31:0)] <- [Al(31:0)]+FFFF8000h {Coded as D2FC-8000h} 

SUBQ.L #1,A1 ; [Al(31:0)] <- [Al(31:0)]-OOOOOOOlh {Coded as 5389h} 

Note again that any operation changing Address register contents always acts 
on all 32 bits, and if word-sized (no byte size allowed), will be sign extended as 
shown in the second example above. 

Memory Direct (or Absolute), M 


op-code 

address 


op-code 

address 


± 16-bit (Short) 
3 2-bit (Long) 


The absolute address itself directly follows the op-code in this mode. In the 
short-form version, only a 16-bit address is specified, and this is sign-extended 
in the usual manner before being sent out on to the address bus. The applicable 
range for this is 00007FFFh to 00000000k and FFFFFFFFh to FFFF8000h. Con¬ 
ceptualizing the memory map as a grand circle, this can be thought of as a range 
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from +0 up to +32,767 and back to -32, 768. The long form will of course specify 
any address directly, but occupies an extra word of program memory and thus 
takes an extra Read cycle (4~) during the fetch phase. Two examples are: 

MOVE.W 500h,DO ; [D0(15:0)] <- [00000500:1] {Coded as 3038-0500h} 

MOVE.W 9000h,DO ; [DO(15:0)] <- [00009000:1] {Coded as 3039-0000-9000h} 

Absolute addresses are by definition constant as part of the program (except 
with risque self-modifying code) and as such are most useful for specifying data 
from I/O ports, which are fixed in the memory map by virtue of their hardware 
decoder. 

Register Indirect Modes 

The most flexible of the address modes; this group generates the effective ad¬ 
dress (ea) as a simple function of the contents of an Address register or the Pro¬ 
gram Counter. As the state of such a register is not constant, it maybe changed at 
any time to reflect the current storage requirements of the program, and may be 
systematically advanced or retarded to deal with arrays or other data structures. 
The opening example of this section on page 11061 demonstrated this flexibility. 

Address Register Indirect, (An) 


op-code 


Here an Address register holds the location of the operand in memory, that is 
points to the operand. The term Indirect is used, as the register does not hold 
the data itself. Thus: 

MOVEA.L #0E100h,A0 ; [A0(31:0)] <- #OOOOE100 {Coded as 207C-0000-E100h} 

ADD.B (A0),DO ; Adds contents of ElOOh to DO {Coded as D010h} 

has the same affect as ADD. B OEOOOh , DO, but of course once A0 is set up, the 
shorter and faster indirect access can be used, and the target address dynamically 
altered by changing the contents of A0. 

Address Register Indirect with Displacement, ±dig(An) or (±dig,An) 


op-code 


displacement 


Similarly to the previous mode, a 16-bit displacement is used to define a signed 
offset of between +32,767 to -32, 768. As an example, if we assume that we have 
two arrays, one starting at EOOO/i and the other at E200h, then, assuming A0 has 
been pointed to El 00 h by the previous example, the sequence: 


MOVE.B -100h(AO),D0 ; Get ARRAY_1[0] {Coded as 1028-FF00h} 

ADD.B 100h(AO),D0 ; and add to it ARRAY_2[0] {Coded as D028-0100} 
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puts the sum of the first two array elements in DO.B. 

Of course the displacement is a fixed part of the program, but if necessary we 
can still change the base address in AO. 

Address Register Indirect with Pre-decrement/Post-increment, - (An)/ (An) + 


op-code 


There are two modes here, both of which automatically modify the designated 
Address register, which points to the operand. The former decrements the ef¬ 
fective address by one, two or four for a byte, word or long object respectively 
before the operation. In the latter case the Address register holds the ea, which, 
after the operation is complete, is incremented by the appropriate one, two or 
four. 

We have already illustrated these modes in use, see Fig. 14. II and the opening 
example on page 11061 where we cleared an array. As a further example, which 
also uses the previous indirect modes, consider the problem of digitally low-pass 
filtering this same array. Taking the 1024 byte-array elements already stored 
between locations E000 and E3FF/i as samples in advancing time, originating 
from, say, an analog to digital converter, then the 3-point algorithm [9] is given 
as: 


vr X[n] , X[n - 1] X[n - 2] 
y [n] = — +- - -+- - - 

where n is the sample number, X[n] the existing nth array sample and Y[n] the 
new filtered nth array element. 

The following listing starts at the top of the X array and works its way down 
overwriting this with the new Y array: 


MOVEA.L 
LOOP: MOVE.B 
LSR.B 
MOVE.B 
LSR.B 
ADD. B 
MOVE.B 
LSR.B 
ADD. B 
MOVE.B 
CMPA.L 
BNE 
RTS 


#0E400h,AO 
-(AO),D0 
#2, DO 
-1(A0),D1 
#1, D1 
D1, DO 
-2(AO),D1 
#2, D1 
D1, DO 
DO,(AO) 
#0E002h,AO 
LOOP 


Point AO to one past X[1023] 

Decrement pointer and then get X[n] 
Divide by 4 

Get X[n-1] (AO unchanged) 

Divide by 2 

Y[n] = X[n]/4 +X[n-l]/2 
Get X[n-2] (AO unchanged) 

Divide by 4 

Y[n] = X[n]/4 + X[n-l]/2 + X[n-2]/4 
Overwrite X[n] by Y[n] 

Check for end, cannot go lower than X[2] 
IF not repeat with n to be decremented 
Exit 


Notice that AO points to X[n] and it is automatically decremented on each pass 
through the loop. Notice also the edge effect in that Y[0] = A[0] and Y[l] = 
*[!]■ 
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Address Register Indirect with Index, 


op-code 


X reg./disp. 


±d 8 (An,X.W) / ±ds(An,X.L) or 
(±d 8 ,An,X.W) / (±d 8 ,An,X.W) 


This mode offsets the contents of a designated Address register with both a 
constant and a variable to give the effective address. The variable index can be 
the contents of any Address or Data register. Either the entire 32 bits (. L) or a 
sign-extended 16 bits (. W) can be used. The constant is a signed 8-bit byte. Thus 
we have: 


< ea >= ±d 8 + X.L + An or ± d 8 + SEX|X.W + An 

As an example consider a subroutine to convert a decimal 0-9, passed in 
DO.B to its 7-segment equivalent returned in the same place. The 7-segment 
equivalents are stored sequentially as a table (array) of 10 bytes following the 
subroutine. We assume the subroutine starts at 0600 h. 

(600/605) M0VEA.L #TABLE_B0T,A0 ; Point A0 to table 

(606/609 1030-0000) M0VE.B 0(A0,DO.W),DO ; Get element [D0(15:0)] 

(60A/60B 4E75) RTS ; and return 

(60C/610 01-4F-12-06-4C) TABLE_B0T:.BYTE 1,4Fh,12h,6,4Ch; 7-segment code 
(611/615 24-20-0F-00-0C) .BYTE 24h,20h,OFh,0,OCh 

If we assume that D0.W is 0004/i on entry, then the first instruction puts 
the absolute address of the first table element (O 6 OC/ 1 ) into A0. The effective 
address calculated in the following instruction is 00 + [A0] + SEX | D0(15 :0), 
in this case 00 + 0000060C + 00000004 = 0000061 Oh. The data in this byte is 
4C h, and this is the value moved to D0(7:0) prior to return. 

As can be seen from this example, this mode is useful for random access 
into an array, with the array number (or a multiple of, for word or long-word 
arrays) being in the Index register. It is instructive to compare this example with 
its equivalent 6809 code on page [39] which used an Accumulator to hold the 
variable offset and one of the Index registers to hold the base address. 

Program Counter Indirect with Displacement, ±die(PC) or (±dig,PO 


op-code 


displacement 


This is similar to Address Register Indirect with Displacement but this time the 
Program Counter is the specified register. For example in: 

MOVE.B 200h(PC),DO ; [D0(7:0)] <- [[PC]+200h] 

the data 200 h bytes on from where the PC is (actually pointing to the next in¬ 
struction) is placed in DO.B. This of course is not an absolute address, as only 
the distance from the instruction is of interest. Like the relative Branch instruc¬ 
tions, a label is normally used for the destination and the assembler evaluates 
the appropriate offset. 
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Program Counter Indirect modes are used to generate position independent 
code (PIC) as described on page |40l As an example, referring back to the 7- 
segment decoder just listed, we see inline 1 that the absolute address of the table 
base, OOOOO 6 OC/ 1 , is placed in Address register AO.L. If, say, the subroutine were 
to be relocated to start at 1 780 h, then the ROM would have to be reprogrammed 
to change the extension word of the MOVEA instruction from O 6 OC /1 to 1 78Ch, the 
rest of the code remaining the same. Here is a PIC version of the same subroutine: 


(600/603 41FA-0006) LEA 6(PC),A0 ; Point AO to table 

(604/607 1030-0000) M0VE.B 0(A0,D0.W),DO ; Get element [D0(7:0)] 

(608/609 4E75) RTS ; and return 

(60A/13 _) TABLE_B0T:.BYTE etc. ; 7-segment code 

The only difference between the two programs is in line 1. Previously the absolute 
address of the table bottom was put into A0. In the PIC case, A0 is loaded with 
the contents of PC plus 6 , which is again the address of the bottom of the table, 
but is calculated at run time. If we were to relocate the subroutine to start at 
1 780/t, nothing would change. 

In practice, if the first line of the program were: 

LEA TABLE_BOT(PC),A0 

the assembler would produce the same code (41 FA-OOO 6 / 1 ), evaluating the dif¬ 
ference between TABLE_B0T and the location of the following instruction, that is 
6 bytes. The absolute value of TABLE_B0T is not used as the offset — as in the 
case of Branch instructions. 

Note the use of Load Effective Address to move the ea generated by any ad¬ 
dress mode (except Pre-Decrement and Post-Increment) into an Address register. 
Some other examples are: 

LEA 20(A7),A7 ; Move Stack Pointer up 20 bytes 

LEA 20(A0,D7.L),A1 ; Add AO.L to D7.L plus 20 and put into Al.L 

LEA is long-word sized only, and must solely target an Address register. 


Program Counter Indirect with Index, 


op-code 


X reg./disp. 


±d 8 (PC,X.W) / ±d 8 (PC,X.L) or 
(±d 8 ,PC,X.W) / (±d 8 ,PC,X.L) 


This is similar to Address Register Indirect with Index in that a constant offset 
plus a variable offset in either an Address or Data register is added to the PC to 
give an effective address. The assembler permits a label to be used as the con¬ 
stant, and will calculate the required difference. Using this mode the 7-segment 
program reduces to: 

(600/603 103B-0002) M0VE.B TABLE_BOT(PC,D0.W),DO 

(604/5 4E75) RTS 

(606/F _) TABLE_B0T: .BYTE etc. 
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Note the offset of 02 in the machine code generated by the first instruction. 

The offset permissible for this mode is only +127 to -128, which represents 
a considerable limitation compared to the plain offset-mode with a range of 
+32,767 to -32, 768 (both ranges have been extended for the 68020 MPU). 

The twelve address modes covered there are summarized in Table 14791 Except 
for the two Register Direct modes, additional time is needed to calculate the 
effective address. Some of this may be due to the necessity to fetch one or more 
extension words, and some due to the address arithmetic. As an example, the 
base time to CLeaR a memory byte is 8 clock cycles (4 to read the op-code and 
4 to send out the zero on the data bus). Thus from the table, CLR. B (An) takes 
8 + 4 = 12~, CLR. B 0E04567h takes 8 + 12 = 20~. Reference @1 gives timings for 
all instructions. The 68008 takes longer to generate eas for most operations due 
to its byte-sized Data bus. 


Table 4.9 A summary of 68000 address modes. 


Address mode 

ea 

Extra cycles 68000/8 

Code 



Byte 

Word 

Long 

Mode:Register 

Dn 

Dn 

0/0 

0/0 

0/0 

OOO+rr 1 

An 

An 

0/0 

0/0 

0/0 

001 :rrr 

(An) 

[An] 

4/4 

4/8 

8/16 

01 0:rrr 

(An)+ 

[An] + 

4/4 

4/8 

8/16 

011: rrr 

-(An) 

[-An] 

6/6 

6/10 

10/18 

1 00:rrr 

±d] 6 (An) 

[An+d 16 ] 

8/12 

8/16 

12/24 

1 01 :rrr 

±d s (An,X) 2 

[An+X+d 8 ] 

10/14 

10/18 

14/28 

11 0:rrr 

±d 16 (PC) 

[PC+d 16 ] 

8/12 

8/16 

12/24 

111:010 

±d 8 (PC,X) 2 

[PC+X+d 8 ] 

10/14 

10/18 

14/26 

111:011 

abs.W 

sex|<abs value> 

8/12 

8/16 

12/24 

111:000 

abs.L 

<abs value> 

12/20 

12/24 

16/32 

111:001 

immediate 


4/8 

4/8 

8/16 

111:100 


Note 1: A 3-bit code indicating the target register for modes 000 b to 11 0b, 
otherwise a submode. 

Note 2: The Index register, which can be any Data or Address register, is specified 
as a 4-bit code in the extension word, which also carries the 8-bit offset. 


Not all address modes are legitimate in many situations. For example, an Im¬ 
mediate operand by definition cannot be specified as the destination ea. Also, 
but not so obviously, the two Program Counter Indirect modes are also illegal for 
a destination operand. This is because it is considered bad practice to modify 
program code, and in any case the area around the PC will frequently be in ROM 
and therefore cannot be altered. The group of address modes excluding PC Rel¬ 
ative and Immediate are referred to as Alterable. Those also excluding Address 
Register Direct are categorized as Data Alterable. In general, except for special 
instructions such as ADDX, all address modes may be used as a source operand. 
The destination operand may be a Data register only, an Address register only 
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or, in more comprehensive operations, such as MOVE and ADD, a Data Alterable 
mode may be specified. Except for MOVE, one of the operands must be a register. 
Table l4~8l summarizes the permitted address modes for each instruction. 

Table 14.91 also lists a 6-bit code against each mode. This is the bit pattern 
used in the op-code to specify the address mode for both source (if present) and 
destination. Two examples are given in Fig. 14.41 Of course it is not necessary 
for the programmer to work out the binary code for an instruction, unless he 
or she suspects the assembler's integrity — I did once find an assembler which 
incorrectly coded one instruction-address mode combination. After all this is 
the main raison d'etre for using an assembler. 


Instruction Code Size Destination Address Mode 


15 

14 

13 

12 

1 

10 


8 

7 

5 4 3 2 1 0 

0 

1 

0 

0 

0 

0 

1 

0 

00 (.B) 
01 (.W) 
10 (.L) 

i i 1 i i 

Mode Register 

_ i . !_ i _ i 

(a) One operand; clr.b 

(A3) 

+ = 

01000010/00/100:011 (4223h). 


.Code 

s 

ze 

Destination Address Mode 

Source 

Address Mode 

15 

14 

1 

12 

1110 9 8 7 

5 4 

3 2 10 



01 

(.B) 

i i | i i 


1 1 1 

0 

0 

11 

(.W) 

Register Mode 

_i_i_i_i_i_ 

Mode 

Register 

_i_i_i_ 



10 

■L 



(b) Two operands; move.w D7, (AO) = 00/11/000:010/000:111 (3087h). 


Figure 4.4 Two examples of machine coding. 


4.3 Example Programs 

The last few sections used program fragments to illustrate various instruction/address 
mode combinations. Here we finish our introduction to 68000 software by devel¬ 
oping three programs of a slightly more elaborate nature. These will implement 
similar functions to those coded in 6809 assembly language in Section 2.3, and 
this will allow comparison between the software of the two processors. 

As in Section 2.3 we are using the Real Time Systems XA8 cross-assembler, 
the syntax and format rules of which were discussed at that point. There are two 
minor differences which are relevant here. 6809 assembly language assigns the 
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Table 4.10 Object code for sum ofn integers program. 

1 .processor m68008 

2 

3 

4 

5 

6 
7 


8 




.psect 

_text 

Direct code into text area 

9 

; for (sum=0;n>=0 

n—){ 




10 

000400 

02800000FFFF 

SUM_0F_N: 

and. 1 

#0000FFFFh,dO 

n promoted to long 

11 

000406 

4281 


clr.l 

dl 

Sum initialized to 00000000 

12 

000408 

D280 

SLOOP: 

add. 1 

dO, dl 

sum = sum + n 

13 

00040A 

51C8FFFC 


dbf 

dO,SLOOP 

n--, REPEAT WHILE N>-1 

14 

00040E 

4E75 

S_EXIT: 

rts 



15 




. end 




FUNCTION : Sums all unsigned word numbers up to n (max 65,535) 
ENTRY : n is passed in Data register DO.W 
EXIT : Sum is returned in Data register Dl.L 


source operand to the operand field and destination to the instruction mnemonic, 
for instance: 

LDB 1234h ; [B] <- [1234h] {[<Destination>] <- [<Source>]} 

LDY #0E000h ; [Y] <- #E000h {[<Destination>] <- <Source>} 

In 68000 assembly language, the mnemonic does not contain any operand 
information, and any operands appear explicitly or implicitly in the operand field 
as <source>,<destination>, for example: 

MOVE.B 1234h,D0 ; [DO(7:0)] <- [1234h] {[<Destination>] <- [<Source>]} 

MOVEA.L #0E000h,A0 ; [A0C31:0)] <- #0000E000h {[<Destination>] <- <Source>} 

However, the size of the operands are indicated in the mnemonic field by the 
extension . B, . W or . [ as appropriate. Both operands are the same size. 

One quirk peculiar to the XA8 cross assembler is the treatment of the MOVE 
Multiple (M0VEM) instruction. The standard Motorola way of representing a 
range of registers is to use the - range operator, for example D0-D3 meaning 
D0/D1/D2/D3. Thus the two ways of indicating a Push of the registers DO to D3 
and A0 on to the System stack are: 

MOVEM.L D0-D3/A0, -(A7) ; Not used by XA8 assembler 

MOVEM.L DO/D1/D2/D3/AO, -(A7) ; Applicable to all assemblers 

The XA8 assembler unfortunately does not support the - range operator. 

Each program module is written in the form of a complete subroutine, with 
data assumed present on entry in some place, usually a Data register, and termi¬ 
nated by a Return from Subroutine (RTS) instruction. We will look at subrou¬ 
tines in some detail in Chapter 5. 

Our first program generates the sum of all integers up to a maximum n 
of 65,535 (FFFFh). We assume that n is passed to the subroutine in the lower 
word of DO. The maximum possible sum of 2,147,450,880 fits comfortably in 
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Table 4.11 A superior implementation. 
.processor m68008 


FUNCTION : Sums all unsigned word numbers up to n (max 65,535) 
ENTRY : n is passed in Data register DO.W 
EXIT : Sum is returned in Data register Dl.L 

EXIT : No other registers disturbed 


7 

8 
9 



psect 

_text ; 

Direct code into text area 

10 

; sum = n*(n+l)/2 




11 

000400 

3200 SUM_0F_N: 

move.w 

dO,dl ; 

Copy n into dl.w 

12 

000402 

5241 

addq .w 

#1,dl ; 

which becomes n+1 

13 

000404 

C2C0 

mul u 

dO,dl ; 

n*(n+l) now in dl.l 

14 

000406 

E289 

lsr.l 

#1,dl ; 

Divide to give n*(n+l)/2 

15 

000408 

4E75 S_EXIT: 

rts 



16 



end 




the 32-bit D1 for return. Compare this with the n = 255 l imi t in the 6809 equiva¬ 
lent on page[TT]due to its smaller registers, although of course external memory 
could have been used for larger operands. 

The algorithm used in the listing of T able RTTOl simply clears Data register_Dl, 
which will hold the 32-bit sum, and also the upper 16 bits of DO. This latter oper¬ 
ation effectively promotes the word-sized parameter n, passed to the subroutine 
in DO.W, to long-sized. The equality is necessary for the addition of line 12, 
which adds the progressively decrementing n to the partial sum. The loop con¬ 
trol DBF implements this decrementation using n both as the operand and the 
loop counter. When n drops below zero, the loop terminates and the final sum 
is in D1 .L as specified. 

The object code shown in Table |4~Tol is the result of passing the source code 
file through the assembler and then the linker-loader, as described in Section 7.2. 
All 68000-based programs in this book assume ROM from 0400h up for the pro¬ 
gram sections designated _text and RAM from EOOOh up for the _data sec¬ 
tions. Only _text is needed in this case. The program is 16 bytes long and takes 
54 + 14n clock cycles to execute (maximum 114,694.75 ps at 8Mhz). 

The alternative direct algorithm: 

(n + 1) 
sum = n x —-— 

is shown coded in Table [471X1 This copies n into D1 .W, adds one, multiplies 
to give the long n x (n + 1) and then divides by two using a Shift Right once 
operation. Only 10 bytes in length, it takes 104 clock cycles to execute (13 ps 
at 8 MHz) irrespective of n. However, like its 6809 equivalent of Table [2H0l one 
value of n will give an erroneous zero answer. It is left to the reader to determine 
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which, and to devise a means to avoid this problem. 


Our second example involves converting a binary number to a string of ASCII- 
coded digits, terminated with 00 h (ASCII NULL). In Fig. I2.4l we implemented this 
by evaluating the nth digit as the number of successful subtractions by 10", 
starting with the maximum n and moving down to zero. The values of 10" were 
stored as a table of constants. We used this technique in preference to the usual 
algorithm of continually dividing by ten, with the remainders giving the digits, as 
the 6809 MPU has no Division operation. This is not the case for the 68000 family, 
and so this is the approach taken in the listing of Table 14.1 21 As an example: 


Fifth digit 


65536-10 = 06553r6 
06553-10 =00655r 3 
00655-10 =00065r 5 
00065-10 =00006r 5 
00006-10 =00000r 6 


Fourth digit 
Third digit 
Second digit 
First digit 


The conversion loop simply divides repetitively by ten the long binary number 
passed in DO, producing the 16-bit remainder in the top of DO and the 16-bit 
quotient at the bottom. SWAP (line 19) is used to reverse the order of these, and 
with the quotient safely at the top, the following Convert to ASCII and Move-byte 
operations leave this undisturbed (lines 20 to 22). Finally, clearing the remainder 
and swapping again restores the quotient as a 32-bit quantity ready for the next 
32 - 16 bit DIVU. 

Data register_Dl .W is used with DBF to give 5 passes around the loop, and A0 
is used as a pointer to the next RAM byte for the string digits, in conjunction with 
the Post-Increment address mode. Multiple MOVEs at the start and end of the 
subroutine Push and Pull all use registers into the System stack, and ensure that 
the internal state (except the CCR) is returned unaltered on completion. 

Unlike the 6809 equivalent in Table I2T21 the binary number is not restricted 
to FFFF/i (65,535). As we have coded the algorithm for five digits, the upper limit 
is 99,999. Changing line 16 to M0VEQ #5 , D1 (i.e. six digits) will increase this to 
655,359 before overflow occurs. The reason the limit is not 999,999 is the 16-bit 
quotient produced by DIVU. The 68020 MPU has a 32 x 32 divide, giving a 32-bit 
quotient and remainder (e.g. DIVUL #10,D0:D1 puts the 32-bit quotient in DO 
and 32-bit remainder in Dl). With the 68000's DIVU, one approach is initially 
to divide the binary number by 10,000, the quotient then holding the upper five 
digits and the remainder the lower five digits. Each half is then processed as 
shown. The limit thus is 4,294,967,295. Coding this is left as an exercise for the 
reader. 


Our final example is the evaluation of the factorial of an integer n passed to 
the subroutine in the lower byte of Data register DO. n! is returned as a long-word 
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Table 4.12 Binary to decimal string conversion. 
.processor m68000 


Converts binary code (max 99,999 decimal) to a string of five 
ASCII-coded characters, terminated by 00 (NULL) 


EXAMPLE 

ENTRY 

EXIT 

EXIT 


3 

4 

5 

6 

7 

8 
9 

10 

11 

12 ; Initialize data and poi 

13 000400 48E7C080 BIN_2_BCD: 

14 000404 207C000E006 

15 00040A 4220 

16 00040C 7204 

17 ; Divide by 10 five times 

18 00040E 80FC000A BL00P: 

19 000412 4840 

20 000414 06000030 

21 000418 1100 

22 00041A 4240 

23 00041C 4840 

24 00041E 51C9FFEE 

25 ; 

26 000422 4CDF0103 

27 000426 4E75 


OOOOFFFF -> '6"5"5"3"5’NUL (36/35/35/33/35/OOh) 
Binary in D0.L 

Decimal string in 6 RAM bytes starting from DEC_STRG 
All register contents except CCR unchanged 


list 
psect 
nter 
movem. 
movea. 
clr.b 
moveq 
the 
di vu 
swap 
add. b 
move. b 
cl r.w 
swap 
dbf 


+.text 
_text 


Direct code into text area 


1 dO/dl/aO,-(sp); Save everything except CCR 
1 #DEC_STRC+6,aO; Point aO to top of string 


-CaO) 
#4, dl 


Put a null at this point 
Loop counter 5-1 = 4 


remainders giving the decimal digits 


#10,dO 
dO 

#'0',dO 
dO,-(aO) 
dO 
dO 

dl,BL00P 


Divide by ten 
Remainder to lower word 
converted to ASCII (add 30h) 

Move down one char & put it out 
Zero this remainder 
& get quotient back in word form 
Dec count & repeat unless -1 


movem.1 (sp)+,d0/dl/a0; Return everything except CCR 
rts 






28 ; 

29 ; This is the area of RAM where the number string is returned in order 

30 ; TEN_TH0U THOU HUNDS TENS UNITS NULL from DEC_STRG to DEC_STRG+5 

31 .psect _data ; Variable data space 

32 00E000 DEC_STRC:.byte [6] ; Reserve six bytes for string 

33 .end 


in Dl. As we observed on page[50] this restricts n to no more than 12, and to 
signal a value outside this range, D0.L is used to return an error status, -1 for 
error and 0 for success. 

As in Section 2.3, there are two techniques for tackling problems of this nature. 
The direct method uses the mathematical definition of factorial as the product 
of all integers up to and including n (with the exception of 0! = 1), as shown 
in Fie. 12.SI Although the 6809 MPU has a multiplication instruction, its 8 x 8 field 
size meant that the necessary 32x8 products had to be evaluated as four separate 
operations together with the necessary shifting and addition. Furthermore the 
growing product had to be kept externally in four memory bytes, all of which led 
to the messy coding of Table IZT31 

Matters are somewhat improved in the 68000 with its 16 x 16 multiply and 
32-bit Data registers. Implementing a 32 x 8 multiplication now involves the 
process: 
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Table 4.13 Mathematical evaluation of factorial n. 


1 



.processor m68000 


3 

; * EXAMPLE : 

n = 12 ; n I 

= 479,001,600 

* 

4 

; * ENTRY : 

n in lower 

byte of dO; maximum value 12 * 

5 

; * EXIT : 

n! in 32-bit dl 


* 

6 

7 

; * EXIT : 

dO.1 = -1 

(FFFFFFFFh) if error (n>12) ELSE 0 

8 



. defi ne 

ERROR = -1 


9 

; Initialize 





10 



.psect 

text 


11 

000400 48E73000 

FACTORIAL: 

movem. 

d2/d3,-(a7) 

Save these registers on Stack 

12 

000404 024000FF 


and .w 

#00FFh,dO 

n extended to 16 bits 

13 

; Error conditions 




14 

000408 0CO0 


cmp. b 

#12,dO 

IF n>12 THEN error condition 

15 

00040C 6304 


bis 

CONTINUE 

ELSE continue 

16 

00040E 70FF 


moveq 

#ERR0R,d0 

FFFFFFFFh in dO signals error 

17 

000410 6022 


bra 

ERR_EXIT 

and exit with it 

19 

000412 7201 

CONTINUE: 

moveq 

#1, dl 

Initialize sum to 00000001 

20 

; N<=1? 





21 

000414 OC000001 

0UTER_L00P 

:cmp.b 

#1, dO 

IF n<=l then answer is in dl 

22 

000418 6318 


bis 

FEXIT 


23 

00041A 3401 

MUL LOOP: 

move.w 

dl, d2 

Lower word of sum to d2 

24 

00041C 3601 


move.w 

dl, d3 


25 

00041E 4843 


swap 

d3 

Upper word to d3 

26 

000420 C4C0 


mul u 

dO, d2 

First product (n*sum.l) in d2 

27 

000422 C6C0 


mul u 

dO, d 3 

Second product (n*sum.u) in d3 

28 

000424 4843 


swap 

d3 

Move it to the upper word 

29 

000426 02430000 


and .w 

#0, d 3 

Zeroing the lower word 

30 

00042A 2202 


move.1 

d2, dl 

Begin to build the new sum 

31 

00042C D283 


add. 1 

d 3 , d 1 

Sum of products 

32 

; n=n-l 





33 

00042E 5300 


subq. b 

#l,d0 


34 

000430 60E2 


bra 

OUTER LOOP 


35 

000432 4280 

FEXIT: 

clr.l 

dO 

Zero indicates no error 

36 

000434 4CDF000C 

ERR_EXIT: 

movem.' 

(a7)+,d2/d3 

Get used registers from Stack 

37 

000438 4E75 


rts 



38 



. end 




SUM.U 

SUM.L 

Split sum of products into two words 

X 

M 

Multiplied by word M 


SUM.L X M 

1st product 


+ SUM.U x M 2nd product shifted left 16 bits 

New sum of products 


Firstly the 3 2-bit sum of products is split into two words each of which is 
multiplied by n (promoted to word size in line 12). The second product is shifted 
left 16 places and the two products added to give the new sum. Repeating this 
with M decrementing from n to 1 gives the loop algorithm of Table ld. 1 !l lines 21- 
34. 

Splitting up the sum of products, using a word MOVE from D1 (holding the 
32-bit sum) to D2.W, gives the 16-bit SUM.L. Moving all of D1 to D3 and then 
swapping words (SWAP D3) puts the 16-bit SUM.U in the lower word of D3. The 
two MULUs of lines 26 and 27 then give the two sub-products. The second of these 
is moved left 16 places by doing a SWAP and clearing the lower 16 bits. Finally, 
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1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 
27 


Table 4.14 Factorial using a look-up table. 
.processor m68000 


EXAMPLE 

ENTRY 

EXIT 

EXIT 


n = 12; n! = 479,001,600 
n in lower byte of dO; maximum value 12 
n! in 32-bit dl 

dO.1 = -1 (FFFFFFFFh) if error (n>12) ELSE 0 


Initialize 


.define ERROR = -1 
.list +.text 

.psect _text 


000400 48E70080 FACTORIAL: movem.l a0,-(a7) 


000404 024000FF 
000408 207C00000426 
; Error conditions 
00040E OCOOOOOC 
000412 6304 
000414 70FF 
000416 6008 

000418 E508 CONTINUE: 
00041A 22300000 
00041E 4280 FEXIT: 
000420 4CDF0100 ERR_EXIT: 
000424 4E75 


and.w #00ffh,d0 
movea.l #TABLE,a0 

cmp.b #12,dO 
bis CONTINUE 

moveq #ERROR,dO 
bra ERR_EXIT 


lsl .b 
move.1 
clr.l 
movem.1 
rts 


#2,dO ; 
0(a0,dO.w) 
dO ; 
(a7)+,a0 ; 


Save aO.L on Stack 
n extended to 16 bits 
Point aO to bottom of table 

IF n>12 THEN an error condition 
ELSE continue 

Put FFFFFFFFh in dO signals error 
and exit with it 

Multiply n by 4 as table is 4-wide 
,dl;Get long-wrd at [a0]+[d0] to Dl 
Zero indicates no error 
Retrieve aO.l from Stack 


Now the table of factorials which is in the text (ROM) area 


28 000426 


TABLE: 


. doubl e 


1, 1, 2, 6, 24, 120, 720, 5040, 40320, 
362880, 3628800, 39916800, 479001600 


00000001 

00000002 

00000006 

00000018 

00000078 

000002DO 

000013BO 

00009D80 

00058980 

00375FOO 

02611500 

1C8CFC00 


29 


. end 


they are summed in Dl (lines 30 and 31) to give the grand total. Decrementing n 
(line 33) completes the loop. 

Once again this example is easier to implement with the 68020 MPU, which has 
a 32 x 32-bit multiply MULU. L. This would avoid the need to split the multiplicand 
in two and later combine the two sub-products. 

On entry to the loop, n is tested for 1 or 0, and if True the subroutine is 
exited with D0.L cleared. The alternative exit if n > 12 (lines 14 and 15) puts 
FFFFFFFFh (-1) in D0.L to signal error and bypasses the clearing operation. 

Where no simple mathematical algorithm exists to specify a function, using 
a table of outcomes is the only approach, for example the 7-segment decoder 
of page lllll Although this is not the case here, there are only 13 successful 
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outcomes to the subroutine, and the use of a look-up table is an attractive propo¬ 
sition. 

Using this approach, the resulting coding of Table l4~i~4l shows the active por¬ 
tion of the program (i.e. excluding error checking and reporting, which is the same 
as the previous listing) to be only lines 21 and 22. The first multiplies n by four to 
match the size of the table entries. This is then used as the Index register (DO.W) 
to point into the table, with AO holding the base address 0426 h (TABLE). For exam¬ 
ple, if n = 4 then [D0(1 5:0)] becomes 1 Oh (4x4) and MOVE. L 0(A0, DO. W) , D1 ef¬ 
fectively moves the 4 bytes starting at 0+ [AO] + [DO(15:0)] = 0 + 0426h +1 Oh = 
0436 h to D1 .L. The contents of 0436:7:8:9h are 24 (00 00 00 1 8h), as required 
for n\. 
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CHAPTER 5 

Subroutines , Procedures and 
Functions 


A subroutine may be defined as a self-standing sequence of instructions which 
may be called from anywhere and, having been run, will return control whence 
it was called. Thus, for example, the code for the calculation of sin(x) may be 
stored offside the main program. To exercise the function: 

y = sin(x); 

the program must jump out to the code, carrying with it the value of x. After 
execution, the outcome y will be found at some prearranged location. 

Subroutines are primarily used to reduce the size of the overall code, since 
they may be successively called from many points outside, including other sub¬ 
routines, and even from inside itself (when they are known as recursive)! For 
example, the calculation of sine may be needed at five different parts of the pro¬ 
gram, but if it is coded as a subroutine, only one implementation is necessary. 
Furthermore, subroutines can be nested, with one subroutine calling another. For 
example, a call to a cosine subroutine will invariably have recourse to the use of 
the sine function. 

Sets of useful subroutines are often organized in a library. These libraries are 
scanned at li nk time (see Section 7.2) and the relevant entries referred to in the 
user's program, extracted and added to the final code. To be used in this man¬ 
ner, each subroutine must be documented with well-defined parameter-passing 
protocols. Libraries may be built up by the user or be available as a commercial 
package. High-level languages usually come with several such packages. 

Aside from saving space, subroutines are the vehicle normally used to im¬ 
plement modular programming CD- A structured approach to hardware design 
decomposes the system into functional modules, for example oscillator, gate, 
counter, decoder, display. Each module has a relatively simple function and may 
be designed, implemented and tested as a separate entity, with the appropriate 
stimuli. This may not produce the smallest, most efficient circuit, but it is likely 
that the product will come to fruition earlier and be more maintainable due to its 
testability. 

The software module is analogous to its hardware cousin as it too can be 
inserted into its motherboard (the main program), takes one or more signals 
(parameters, e.g. x) and has an outcome (return values, e.g. sin(x)). A software 


123 


1 24 C FOR THE MICROPROCESSOR ENGINEER 


module, invariably in the form of a subroutine, is normally self-standing with its 
own area of code (usually in ROM) and data storage in RAM. Good programming 
techniques are used to enforce a single entry and exit point and a minimum of 
interaction with data areas used by other modules. 

The expression function is commonly used in high-level languages to describe 
a callable module. In Pascal the name procedure is reserved for the special case 
of a function that returns no value, that is a void function. In co mm on with 
assembly language, Fortran uses the name subroutine. Irrespective of the name 
used, assembly-level subroutines are normally used to implement these high- 
level modules. Thus an understanding of the structure of subroutines is the 
key to comprehending the operation of these important aspects of high-level 
languages. This is the objective of this chapter. 


5.1 The Call-Return Mechanism 

In essence, getting to a subroutine involves nothing more than placing the address 
of its opening instruction in the Program Counter (PC), that is doing a Jump or 
Branch. Thus, if we take as an example a subroutine which evokes a delay of 0.1 s 
(i.e. does nothing for 100ms) and starts at El OOh, then JMP OElOOh will transfer 
control. In practice the programmer will probably not know the absolute address 
of the subroutine, especially if it is hidden in a library. Flowever, a subroutine 
entry point is normally identified with a label, and the assembler or linker will 
evaluate the appropriate address, for example JMP DELAY (see Table [T?] . 

The problem lies not in getting there, but returning afterwards. As can be seen 
from Fig. 15.11 the jumping-off point may be from anywhere in the main program 
or indeed from another subroutine — the latter process is known as nesting. 
Thus the microprocessor (MPU) needs to remember the value of its PC (which is 
already pointing to the instruction following the Jump or Branch after its fetch) 
before its contents are overwritten. 

One possibility is to move the contents of the PC to a designated memory 
location or Address register, for example LEAX 0, PC (Load Effective Address 
0 + PC to the 6809'sX register) or LEA 0(PC) , AO (Load Effective Address 0 + PC 
into the 68000's AO register). Then the subroutine can be terminated by moving 
t hi s pre-saved jumping-off address back to the PC (JMP 0,X or JMP (AO)). 

This approach breaks down when a subroutine wishes to call another, for 
the secondary subroutine will overwrite the return address of the primary. To 
get around this problem, the jumping-off address could be pushed down into a 
stack, rather than using a fixed register or memory location. As each subroutine is 
called, this Stack Pointer is moved down automatically by the appropriate number 
of bytes. Returning inwards simply involves the mirror operation of pulling up 
out of the stack back into the PC. The Stack Pointer moves up accordingly. This 
last-in first-out sequence, necessary for nesting, exactly describes the structure 
supported by the System stack/Stack Pointer. 

Using this technique gives us: 


THE CALL-RETURN MECHANISM 125 



Main flow 


(a) Calling up a subroutine twice. 



Main flow 


(b) Nested subroutine with Main calling SRI, which In turn 
calls SR2, which in turn calls SR3. SRI also calls SR3. 


Figure 5.1 Subroutine calling. 


PSHS PC 1 

JMP DELAY j = JSR DELAY . PULS PC = RTS 

for the 6809 MPU, and 
PEA 0(PC) 1 

JMP DELAY f = JSR DELAY . JMP C5P:>+ = RTS 

for the 68000 MPU. 

Notice how we simulated a Pull operation for the 68000 MPU, which does 
not have an explicit Pull instruction. The Post-Increment Indirect address mode 
operation on A7 (the System Stack Pointer) causes the SSP to move up (4 bytes) 
after the data (the return address) has been extracted. By definition, the Jump 
operation puts this extracted address in the Program Counter. 

Calling and returning from a subroutine is a sufficiently frequent operation 
to warrant the specific Call and Return instructions of Table 15.11 These have 
exactly the same outcome as the generalized approach shown above. Jump to 
Subroutine (JSR) and its relative Branch to Subroutine (BSR) push the re¬ 
turn address on to the System stack before going off. The BSR variants follow 
the same rules as ordinary Branches (see Sections 2.2 and 4.2) and of course 
generate position-independent code (PIC). Return from Subroutine (RTS) pulls 
the return address back from the System stack. 80x86 microprocessors use the 
mnemonics CALL and RET for the same purpose. 
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Table 5.1 Subroutine instructions. 


Operation 

Mnemonic 

Description 

Call 


Transfer to subroutine 

Jump to subroutine 

JSR ea 

Push PC onto Stack, PC <- <ea> 

Branch to subroutine 1 



short 

BSR offsetg 

Push PC onto Stack, PC <- PC + sex | offsetg 

long 

LBSR offsetig 

Push PC onto Stack, PC <- PC + offsetig 

Return 


Transfer back to caller 

from subroutine 

RTS 

Pull original PC back from Stack 


(a) 6809 instructions. 


Call 


Jump to subroutine 

JSR ea 

Branch to subroutine 1 


short 

BSR offsetg 

long 

LBSR offset^ 

Return 


from subroutine 

RTS 

and restore CCR 

RTR 

Frame 


Make 

LINK An,#kkig 

Close 

UNLNK An 


Transfer to subroutine 

Push PC onto Stack, PC <- <ea> 

Push PC onto Stack, PC <- PC + sex | offsetg 
Push PC onto Stack, PC <- PC + sex | offset^ 

Transfer back to caller 
Pull original PC back from Stack 
Pull original CCR back from Stack 
Pull original PC back from Stack 

Maintain a frame for local variables 
An into Stack (save old Frame Pointer, An) 

An <- SP (Point An to Top Of Frame, TOF), 

SP <- SP + sex | kkig (SP to Bottom of Frame) 

SP <- An (move SP back to TOF), 

Pull An (Get old Frame Pointer from Stack) 


(b) 68000 instructions. 

Note 1: Available in signed 8-bit (+127, -128) and 16-bit offset (+32,767, -32, 768) 
varieties. Most assemblers can chose the appropriate versions automatically. 
The 68020 upwards have a full 32-bit offset Branch capability. 


From Fig. I5.2l we see that the action of 1SR/BSR and RTS on the System stack 
is the same for both 6809 and 68000 MPUs, except the latter requires four bytes. 
As is usual for Motorola MPUs, the lower byte is located in the higher address (i.e. 
the lower byte of the address is pushed out first). The 68000's SSP must always 
point to an even address, and this will be enforced even if a single byte is pushed 
out. 

As an example, consider a subroutine to give a 0.1 s delay. This is easily imple¬ 
mented by loading a constant into a register and decrementing to zero. Coding 
for the 6809 and 68000 processors is shown in Table [5721 Other than the termi- 
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SSP 


Stack 

(i) Before a Call 


4000h 

3FFFh 

3FFEh 

3FFDh 

3FFCh 

3FFBh 

3FFAh 



SSP I 



(a) The 6809 MPU. 
SSP- 



4000h 


4000h SSP 



3FFFh 

PCL Byte 3 

3FFFh 

PCL Byte 3 


3FFEh 

Byte 2 

3FFEh 

Byte 2 


3FFDh 

Byte 1 

3FFDh 

Byte 1 


3FFCh SSP 

PCH Byte 0 

3FFCh 

PCH Byte 0 


3FFBh 


3FFBh 


Stack 

3FFAh 

Stack 

3FFAh 

Stack 


3FFFh 

3FFEh 

3FFDh 

3FFCh 

3FFBh 


(i) Before a Call 


(ii) After a JSR or BSR 


3FFAh 

(iii) After a RTS 


(b) The 68000 MPU. 


Figure 5.2 Saving the return address on the Stack. The SSP assumed a priori set to 4000h. 

nating RTS, the programs are perfectly normal routines. Strictly, in calculating 
their delay, the time to get to the subroutine should be considered, and this can 
differ according to how far away the subroutine is from the caller and which Call 
instruction and/or address mode is used. This also illustrates that there is a time 
overhead in using a subroutine, and where speed is of the essence, in-line code 
should be used. 

Notice that in both cases illustrated in Table lT2l one of the registers (X or AO) 
will be returned in an altered state, the same being true of the Code Condition reg¬ 
ister (CCR). Provided that such changes are well documented, this will frequently 
be of little consequence. However, it is often preferable to make subroutines 
transparent in that all registers, or perhaps a subset, remain unaltered. This can 
be accomplished by pushing all relevant registers into a stack at the beginning 
of the subroutine and pulling them out again just before the final exit RTS. This 
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Table 5.2 A simple subroutine giving a fixed delay of 100 ms when called. 

1 .processor m6809 

2 ; *********-**-*******-*-************-*********-*********-**-‘*** 

3 ; 

4 ; 

5 ; 

6 ; 

7 

8 E000 

9 E003 

10 E005 

11 E007 

12 

(a) 6809 code; 1 MHz clock, ~ = 1 ps. 

1 .processor m68000 

2 ———————————————————— 

3 

4 

5 

6 

7 .define N = (200000-8-14)/14 

8 000400 303C37CC DELAY: move.w #N,d0 ; Delay factor, 8~ 

9 000404 5340 DL00P: subq.w #l,d0 ; Decrement , Nx4~ 

10 000406 66FC bne DL00P ; to zero , Nxl0/8~ (taken/not) 

11 000408 4E75 rts ; , 16~ 

12 .end 


(b) 68000 code: 8 MHz clock, ~ = 0.5 fis. 


is easy in the 6809 MPU, as any combination of registers, including the CCR, can 
be Pushed or Pulled with a single instruction, see Table [sTSta). There is a slight 
problem with the 68000 MPU. The M0VEM instruction used for Pushing and Pulling 
only acts on Address and Data registers. There is a MOVE SR, -(SP) instruction 
which copies the whole Status register, of which the CCR is the lower byte. The 
opposite Pull operation is supported, that is MOVE (SP) +, CCR! Although the lat¬ 
ter only pulls out a byte, the SSP moves up two bytes. This is necessary to obey 
the rule that the SSP always points to an even address, and thus preserves the 
integrity of the System stack. Interestingly the 68010 and higher family mem¬ 
bers have gained the missing MOVE CCR,<ea> instruction, which matches the 
MOVE <ea>, CCR instruction. 

From Table f5TTT b). we see that the 68000 family has a second Return instruc¬ 
tion, RTR (Return and Restore CCR). This is used as an equivalent to the se¬ 
quence: 

MOVE (SP)+,CCR 
RTS 

and assumes that the CCR has been saved out onto the System stack at the be¬ 
ginning of the subroutine before any other stack-based operations have altered 
the SSP. Notice from Table fOl b) that the CCR is saved first (line 8), before the 
Data register is Pushed. The Pull sequence at the end of the subroutine is then 
in the reverse order. Failure to observe this can lead to spectacular crashes! The 
equivalent instruction PULS CCR, PC is sometimes used to terminate a 6809 sub¬ 
routine (see Table [83}. 


This subroutine does nothing and takes 0.1s to do it 
ENTRY : Non 

EXIT : D0.W Data register = 0000, CCR destroyed 


' This subroutine does nothing and takes 0.1s to do it 
■' ENTRY : Non 

•' EXIT : X Address register = 0000, CCR destroyed 

.define N =12500-(5+3/8) 

8E30D3 DELAY: idx #N ; Delay factor, 3~ 

301F DL00P: leax -l,x ; Decrement , Nx5~ 

26FC bne DL00P ; to zero , Nx3~ 

39 rts ; , 5~ 

. end 
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Figure 5.3 The stack when executing the code of Table \5.3\t b). viewed as word-oriented. 


(iv) MOVE.W D0,-(SP) (Push) (v) MOVE „ W (SP)+,D0 (Pull) (vi) RTR (Pull) 
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Table 5.3 Transparent 100 ms delay subroutine. 
.processor m6809 


* This subroutine does nothing and takes 0.1 s to do it 

* ENTRY : None 

* EXIT : No change 

.define N =12500-(5+3+7+7/8) 


8 

E000 

3411 DELAY: 

pshs 

x, cc 

Save Address 

reg and CCR 

, 7~ 

9 

E002 

8E30D2 

1 dx 

#N 

Initial delay 

factor 

, 3~ 

10 

E005 

301F DL00P: 

1 eax 

-1, X 

Decrement 


, Nx5~ 

11 

E007 

26FC 

bne 

DL00P 

to zero 


, Nx3~ 

12 

E009 

3511 

pul s 

X , cc 

Get registers 

back 

, 7~ 

13 

E00B 

39 

rts 



, 5~ 

14 



. end 






(a) 6809 code. Note lines 1 2 and 1 3 could be replaced by pul s x, cc, pc. 

.processor m68000 


* This subroutine does nothing and takes 0.1 s to do it 

* ENTRY : None 

* EXIT : No change 


1 
2 

3 

4 

5 

6 

7 

8 000400 40E7 DELAY: 

9 000402 3F00 

10 000404 303C37CC 

11 000408 5340 DL00P: 

12 00040A 66FC 

13 00040C 301F 

14 00040E 4E77 

15 


.define N = (200000-8-18-14-8-8)/14 


move sr,-(sp) 
move.w d0,-(sp) 
move.w #N,d0 
subq.w #l,d0 
bne DL00P 
move.w (sp)+,d0 
rtr 
end 


Save CCR (in SR) 
and Data reg d0(15:0) 
Initial delay factor 
Decrement 
to zero 

Retrieve old d0(15:0) 
Retrieve CCR then RTS 


14~ 

8 ~ 

8 ~ 

Nx4~ 

Nxl0/8~(taken/not) 

8 ~ 

20 ~ 


(b) 68000 code. Note rtr is equivalent to 6809 code pul s cc, pc. 


Apart from its convenience, transparency is necessary to support the recur¬ 
sive use of a subroutine. A subroutine is recursive if it calls itself. Clearly register 
variables used in the subroutine will be wiped out when used again by the next 
recursion. Similarly, static memory locations cannot be used to store variables 
for a subroutine which is to be recursive, but variables can be saved in a stack, 
as shown in the next section, where they are known as automatic variables. 


5.2 Passing Parameters 

The simple fixed-delay subroutine used as the example in the previous section is 
unusual, in that no information was passed from the caller and none returned. 
Another example of a double-void subroutine would be a function actuating 
an external relay, where the very act of calling is sufficient. The actuation is 
sometimes referred to as a side effect. 

Consider the situation where the total delay is to be an integer (0 to 65,535) 
multiple of 0.1 seconds, depicted as DELAY(Z), where Z is the aforementioned 
integer passed to the subroutine by the caller. In Table f5~4l it is assumed that the 
caller has set up the D1 .W register accordingly. Thus to invoke a 1 s delay, the 
call would be something like this: 












PASSING PARAMETERS 1 31 


Table 5.4 Using a register to pass the delay parameter. The call-up sequence shown above passed a 
constant (ten) to the subroutine. 

1 .processor m68000 


L 

3 

4 

5 

6 

* 

This subroutine does nothing and takes ZxO.ls to do it 

EXAMPLE : Z = 10; delay = 1 second 

ENTRY : Z passed in lower 16 bits of D1 

EXIT : Dl(15:0) = FFFF, D2(15:0) = 0000, CCR destroyed 

* 

8 




. defi ne 

N = (200000-8-10)/14 


9 000400 

6008 

DELAY: 

bra 

L00PTEST 

Check Z = 0 , 

10- 

10 000402 

303C37CC 

0UTERL00P: 

move.w 

#N, dO 

100ms delay factor , 

8~ 

11 000406 

5340 

INNERL00P: 

subq.w 

#1, dO 

Decrement , 

Nx4~ 

12 000408 

66FC 


bne 

INNERL00P 

to zero , 

Nxl0/8 

13 00040A 

51C9FFF6 

L00PTEST: 

dbf 

dl,0UTERL00P 

One less 100ms click, 

10/14- 

14 00040E 

4E75 


rts 


, 

16- 

15 




. end 




MOVE. 

W 

#10,D1 

; Ten ticks = 1 

second 



BSR 


DELAY 

; Go to 

it! 





The coding itself uses an inner loop (lines 11 and 12) identical to that in Ta¬ 
bles [572] and 15.31 with DBF being employed in conjunction with D1 .W (i.e. Z) to 
count the number of passes through this inner core (i.e. 0.1 s ticks). This DBF 
Decrement and Test is exercised immediately the subroutine is entered, to en¬ 
sure a speedy exit should Z be zero. The delay due to line 9 only happens once, 
and can be thought of together with the caller's JSR/BSR as a constant error in¬ 
dependent of Z. No data is returned from this void subroutine. 

If the delay parameter is a variable, for example data read from an analog to 
digital converter, and stored somewhere in memory at MEM_Z, then: 

MOVE.W MEM_Z,D1 ; Copy the delay variable to D1 

BSR DELAY ; to pass to subroutine 

will do the necessary. Note that the parameter passed is a copy of the variable 
(still in MEM_Z), not the variable itself. Thus when D1 .W is decremented in the 
subroutine, Z will not be altered, just its clone. Passing copied parameters is 
known as call by value 0. We will look at ways of directly affecting variables 
through a subroutine later. 

Using registers to pass parameters is convenient, fast and efficient. Further¬ 
more, with some modification, it is suitable for recursion (subroutines that call 
themselves), supports re-entrant code (subroutines which can be interrupted and 
then called again by the service routine, see Section 6.1) and is position indepen¬ 
dent. Its main problem is lack of generality, as the complement, range and type 
of registers available vary considerably between devices. Thus the 6502 MPU has 
two 8-bit Address registers and one 8-bit Data register, the 8086 with four 16- 
bit Data registers and three 16-bit Address registers, while the 68000 has eight 
32-bit registers each of both types. This is especially a problem with high-level 
language compilers, which attempt to be portable between processors. 
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Table 5.5 Using a static memory location to pass the delay parameter. 
.processor m68000 

* This subroutine does nothing and takes ZxO.l s to do it 


4 

5 

6 

. * 

. * 

. * 

EXAMPLE : 
ENTRY : 
EXIT : 

Z = 10; delay = 1 s 

Z passed in memory location 6000/6001h 

Dl(15:0) = FFFF, D2(15:0) = 0000, CCR destroyed 

* 

* 

* 

7 

8 




defi ne 

N = (200000-8- 

-10)/14 


9 

000400 

32386000 

DELAY: 

move.w 

6000h,dl 

Get delay parameter, 

12~ 

10 

000404 

6008 


bra 

L00PTEST 

Check Z = 0 , 

10- 

11 

000406 

303C37CC 

0UTERL00P: 

move.w 

#N, dO 

100 ms delay factor, 

8~ 

12 

00040A 

5340 

INNERL00P: 

subq . w 

#1, dO 

Decrement , 

Nx4~ 

13 

00040C 

66FC 


bne 

INNERL00P 

to zero , 

NxlO/8 

14 

00040E 

51C9FFF6 

L00PTEST: 

dbf 

dl,0UTERL00P 

1 less 100 ms click, 

10/14- 

15 

000412 

4E75 


rts 


J 

16- 


16 


. end 


Another technique, used especially with MPUs having a small complement of 
registers, is to use assigned memory locations as a common area between caller 
and subroutine. Where the location is fixed, this is known as static allocation. 
In Table [531 a single memory word is used to pass the static variable Z, with the 
caller copying the delay parameter thus: 

MOVE.W MEM_Z,6000h ; Copy the delay variable from memory 

BSR DELAY ; to pass to the subroutine via 6000h 

If MEM_Z was actually the common memory location, then this copy would not 
need to be made, but care would have to be taken not to alter the variable itself 
(rather than the copy). 

The use of common static memory has the advantage of being able to pass 
large numbers of parameters and structures such as arrays. However, as these lo¬ 
cations are by definition fixed, such subroutines cannot be recursive or re-entrant. 
Also, unless different static locations are used for each subroutine, nesting can 
lead to unfortunate side effects as one subroutine inadvertently alters another 
subroutine's variables. This makes debugging difficult, as routines other than the 
one being tested may interact in unpredictable ways. Such common areas can be 
used to hold global variables, which are known throughout all linked program 
modules. 

Many of these problems can be overcome by using a stack to pass variables 
back and forth, or preferably putting the m th ere in the f irst place ||3j, 01. This 
situation is depicted in the listing of Table [T6l and Fig. |5.4| Now to call up DELAY, 
a copy of the delay variable Z is pushed onto the System stack before calling the 
subroutine. On return the System Stack Pointer must be moved back up again to 
balance this Push and be returned to its original position. Using LEA 2 (SP) , SP 
is an alternative to ADDQ #2 , SP (or ADDA +2 , SP), and can be used for operands 
up to 32,767. The 8086 MPU family has a convenient RET #n instruction which 
is equivalent to LEA +n(SP) ,SP after a RTS. Similarly, the 68010 and up has a 
RTD #n equivalent (Return and Deallocate parameters) where n is a 16-bit 
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Table 5.6 Using the stack to pass the delay parameter. 
.processor m68000 


* This subroutine does nothing and takes ZxO.l s to do it 


EXAMPLE 

ENTRY 

EXIT 


Z = 10; delay = 1 s 

Z passed in Stack at SP+4/SP+5 

Dl(15:0) = FFFF, D2(15:0) = 0000, CCR destroyed 


8 




. defi ne 

N = (200000-8- 

-10)/14 


9 

000400 

322F0004 

DELAY: 

move.w 

4(sp) ,dl 

Get delay parameter, 

12- 

10 

000404 

6008 


bra 

L00PTEST 

Check Z = 0 , 

10- 

11 

000406 

303C37CC 

0UTERL00P: 

move.w 

#N, dO 

100 ms delay factor, 

8- 

12 

00040A 

5340 

INNERL00P: 

subq .w 

#l,d0 

Decrement , 

Nx4~ 

13 

00040C 

66FC 


bne 

INNERL00P 

to zero , 

NxlO/8 

14 

00040E 

51C9FFF6 

L00PTEST: 

dbf 

dl.OUTERLOOP 

1 less 100 ms click, 

10/14- 

15 

000412 

4E75 


rts 


J 

16- 

16 




. end 





SSP 


MOVE.W MEM_Z,-(SP) 


SSP 


BSR DECAY 


SSP 


Go 



SP+6 

SP+4 

SP+2 

SP 



Return 


Figure 5.4 The Stack corresponding to Table t 5. 6\ 
immediate parameter sign-extended to 32 bits. 

MOVE.W MEM_Z,-(SP) ; Copy delay variable to the System stack 

BSR DELAY ; to pass to the subroutine 

LEA +2(SP),SP ; Clean up stack after return 

Comparing Tables [5TBl and [575l we see that the only change is of Address mode 
in line 9. From Fig. |5.4| we see that Z lies 4:5 bytes up from where the SSP points 
to on arrival. Its effective address is thus 4(SP). 

Passing parameters using dynamic allocation permits nesting, recursion and 
re-entrancy as the SSP automatically moves down for each call and up again on 
each return. Essentially such variables are local (sometimes called automatic) and 
are known only to their own subroutine. The technique is general to all processors 
supporting a stack, and is used by block-structured high-level languages such as 
Algol, Pascal and cm. It is also possible to return values on a stack in a similar 
manner. 

All our examples so far have involved copying the value of a variable to pass to 
the subroutine. The actual variable itself is somewhere out in read/write memory 
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Table 5.7 Making a copy of a block of data of arbitrary length. 
.processor m68000 




3 

* Copi es 

4 

* ENTRY 

5 

* ENTRY 

6 

* ENTRY 

7 

* EXIT 

8 

* EXIT 

9 

* EXIT 

10 

* EXIT 

11 

******** 

12 



a block of data from one area (e.g. ROM) to another (e.g. RAM) 
: Constant LENGTFI passed in SP+22/23 (up to 65,535) 

: Constant address RAM_START passed in SP+18/19/20/21 
: Constant address R0M_START passed in SP+14/15/16/17 
: Block of data from R0M_START to R0M_START+(LENGTH-1) 

: copied to RAM_START to RAM_START+(LENGTH-1) 

: DO.W = FFFFh if copy successful 

: else LENGTH-DO.W is number of successful bytes transferred 


13 000400 42E7 BL0CK_C0PY: 

14 000402 48E740C0 

15 ; Get length parameter 

16 000406 302F0016 

17 00040A 4A40 

18 00040C 6714 

19 00040E 5340 

20 ; Now do the move loop 

21 000410 206F0012 

22 000414 226F000E 

23 000418 1219 CLOOP: 

24 00041A 1081 

25 00041C B218 

26 00041E 56C8FFF8 

27 ; Pass here IF LENGTH i 

28 000422 48DF0302 EXIT: 

29 000426 4E77 

30 


move CCR,-(SP) 

movem.l A0/A1/D1,-(SP) 
and check for zero 
move.w 22(SP),DO 
tst.w DO 
beq EXIT 

subq.w #1,D0 


movea.1 
movea.1 
move.b 
move.b 
cmp. b 
dbne 
s zero, 
movem.1 
rtr 
end 


18CSP),A0 
14(SP) ,A1 
(A1)+,D1 
Dl,(A0) 

(A0)+,D1 
DO,CLOOP 
OR error occurs 
(SP)+,A0/A1/D1 


Save CCR 

Save used registers 

Get LENGTH out from stack 
Is it zero? 

IF yes THEN exit 

ELSE redress DBNE'S n+1 loop 

Point A0 to RAM 
Point A1 to ROM 
Move byte ROM to Dl 
and hence up to RAM 
Did it get there ok? 

IF so THEN dec. and repeat 
OR copy is finished 
Restore registers 
and CCR before return 


and is not altered by processes in the subroutine. It is possible to use a subroutine 
to affect a variable directly by passing the address of that variable. This is known 
as call by reference 0. Now that the subroutine knows where the variable lives, 
it can be modified. Passing addresses is also useful in pointing out to a subroutine 
where a large data structure, such as an array, is stored without having to send 
all its elements over. Only a pointer to the first element and its length need be 
passed. 

A rather more sophisticated example of a program making use of a stack to 
pass both a copy of a variable and pointers is given in Table [5771 The program 
specification is to make a copy of a block of data from one area of memory to 
another area of read/write memory. Parameters passed are pointers to the 
start of the source and destination blocks, and the length of the original block 
(assumed to be not greater than 64 kbytes). A successful copy is signalled by 
returning the code -1 (FFFFh) in DO.W. If any copy action is unsuccessful, then 
the subroutine is exited with DO.W holding the block length less the number 
of successfully transferred bytes. The caller can then subsequently calculate 
LENGTFI - DO.W to give the number of bytes actually transferred. Other than the 
error status return, all other registers are to be unaltered. A typical application 
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Figure 5.5 The Stack used for the BL0CK_C0PY subroutine. 


of such a subroutine would be to copy a table of initialized variables stored in 
ROM by a compiler to RAM where they can be modified later (see Table ITT). 1 2\ . 
Initial values cannot be stored in RAM, as such memory is volatile. Usually the 
compiler will generate the necessary constants, such as block start addresses and 
length, at link time. 

The core of the program is contained in lines 21 to 26 of Table [5/71 Each byte 
is moved directly from memory to memory using Address registers to point to 
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the two locations. A comparison tests for a successful copy as well as advancing 
the pointer. The DBNE loop control exits if it is true that the two bytes are not 
equal (i.e. unsuccessful) otherwise decrements the count in DO.W (originally set 
to LENGTH - 1 in line 19) and repeats. The residue in DO.W will be FFFF/i if each 
copied byte is verified, otherwise its exit state reflects the number of loop passes 
taken. 

The System stack, as seen in Fig. 15.51 is used for three purposes. Firstly the 
three parameters are pushed out prior to the call, in a sequence such as: 


MOVE.W 

#LENGTH, 

-(SP) 

MOVE.L 

#RAM_START, 

-(SP) 

MOVE.L 

#R0M_START, 

-(SP) 

BSR 

BL0CK_C0PY 


LEA 

10(SP),SP 



Word length parameter pushed (2 bytes) 
Pointer to start of RAM pushed (4 bytes) 
Pointer to start of ROM pushed (4 bytes) 
Go to it 

After return, clean up Stack 


Then the actual call places the PC on the System stack automatically. Finally, as 
the subroutine is to be transparent, the System stack is used to save any used 
registers, apart from DO. 

The code shown in Table [577] uses offsets from the SSP to obtain the three 
parameters, for example MOVE. W 22 (SP) , DO. This can cause problems, since in 
the body of many subroutines, the SSP is used to Push and Pull temporary results 
of evaluation into and out of the System stack. In particular local variables (that 
is variables used only by the subroutine and forgotten about after return) are also 
frequently kept on this stack. All this means that the parameter offsets from the 
SSP will be in a constant state of flux. To get around this problem another Address 
register is frequently pointed to the top of the System stack at the beginning of 
the subroutine and this remains as a fixed point of reference for the duration of 
the subroutine, irrespective of what is happening to the SSP. This is known as the 
Frame Pointer (FP), with the space used on the System stack after entry being the 
Frame. 

Our final example is used to illustrate the concept of a Frame. Consider a 
subroutine where an analog signal must be sampled as rapidly as possible for a 
variable number of times, using an 8-bit analog to digital converter, after which 
the resulting array is to be processed in some manner. Typical processes are 
filtering, averaging and peak detection. To keep our program as simple as pos¬ 
sible, we will assume that we wish to return the simple sum of not more than 
256 of these samples. To comply with the injunction that sampling should be as 
quick as possible, it will be necessary to allocate space to store temporarily up 
to 256 bytes. After this burst of sampling, the process can be carried out on the 
array now in situ in this RAM buffer. 

Our first implementation is based on the 6809 MPU, as an example of a proces¬ 
sor without any specific Frame-handling instructions. The System stack reflecting 
the coding of Table fsTHl is shown in l iu. l5.6l . The variable i representing the num¬ 
ber of samples to be taken is pushed on to this stack in the normal way prior to 
the subroutine call. The subroutine itself commences by saving the contents of 
the User Stack Pointer (USP) on the System stack. The USP is to point to the Top 
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Figure 5.6 The 6809 System stack organized by the array averaging subroutine. 


Of Frame (TOF) and is thus to be the Frame Pointer. Transferring the contents of 
the System Stack Pointer (SSP) to the USP effectively points the Frame Pointer to 
the TOF, and then the SSP is moved down 257 bytes, one to hold the temporary 
(local) variable holding the count and 256 for the array (lines 11 -13). At this 
point, the SSP points to the bottom of the frame (BOF) but, as all references in Ta- 
hle l5.8l u.se the Frame Pointer (e.g. line 21, DEC -1, U), it can be used subsequently 
for other purposes. 

After the body of the subroutine, the Frame is closed by copying the Frame 
Pointer to the SSP — that is moving it up to the TOF — and pulling out the old 
Frame Pointer, before RTS (lines 34 and 35). Of course, after return the System 
stack will need to be cleaned up to compensate for passing i. 
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1 

2 

3 

4 

5 

6 

7 

8 
9 

10 


Table 5.8 Using a frame to acquire temporary data; 6809 code. 
.processor m6809 


Burst acquires up to 256 analog samples and returns the sum 

ENTRY : i is the number of samples on the stack 

EXIT : The sum of i 8-bit samples in Accumulator D 

EXIT : X, CCR altered. U is used as the Frame Pointer 


. defi ne 

First make the frame 


A_D = 6000h ; Where the A/D converter lives 


11 

E000 3440 ARRAY AV 

pshs 

u 

Save current USP on stack, old FP 

12 

E002 1F43 

tfr 

s,u 

Point Frame Pointer to T0F 

13 

E004 32E9FEFF 

leas 

-101h,s 

Open frame of 257 bytes, SP to B0F 

14 

; Initialize to acquire data 


15 

E008 E644 

ldb 

4, u 

Copy i into frame 

16 

E00A E75F 

stb 

-1, u 

to initialize count (= i) 

17 

E00C 305F 

1 eax 

-1, u 

X to just above ARRAY[0] (array ptr) 

18 

; Burst sample 




19 

E00E F66000 GET_L00P 

ldb 

A_D 

Get data 

20 

E011 A782 

sta 

, -x 

Put it in frame, decrement pointer 

21 

E013 6A5F 

dec 

-1, u 

count = count - 1 

22 

E015 26F7 

bne 

GET_L00P 

and repeat 

23 

; Initialize to sum 

data 



24 

E017 E644 

ldb 

4, u 

Copy i back into frame again 

25 

E019 E75F 

stb 

-1, u 

to initialize count (= i) 

26 

E01B 305F 

1 eax 

-1, u 

Point X to just above ARRAY[0] again 

27 

E01D 4F 

cl ra 


Clear sum (Acc.D) 

28 

E01E 5F 

cl rb 



29 

; Now do the summation 



30 

E01F E382 ADD_L00P 

addd 

, -X 

Add byte to sum, decrement pointer 

31 

E021 6A5F 

dec 

-1, u 

count = count - 1 

32 

E023 26FA 

bne 

ADD_L00P 

and repeat 

33 

; Close frame 




34 

E025 1F34 

tfr 

u, s 

Move SP back up 

35 

E027 3540 

pul s 

u 

and get back old frame pointer, USP 

36 

E029 39 

rts 


and return 

37 


. end 




The core program in lines 15 - 32 is unremarkable. The Frame Pointer is copied 
into the X Index register to permit the use of the Pre-Decrement Index Address 
mode in stepping through the array, yet leaving the Frame Pointer untouched 
(lines 17 and 20). The passed parameter i is copied into the Frame to initialize the 
loop counter in both instances (lines 16 and 25). It would be more efficient to use 
an Accumulator as a loop counter, but the 6809 MPU does not have enough regis¬ 
ters to make the use of such register variables a feasible proposition. One quirk 
exhibited by this implementation is the need to pass i = 0 to sample 256 times, 
as a byte can only represent up to 255. 

The 68000 System stack of Fig. 15.71 reflecting the code in Table fsTol is very sim¬ 
ilar to its 6809 counterpart. This time i is passed as a word to preserve the even¬ 
ness of the System Stack Pointer (a byte sized Pre-Decrement/Increment M0VEM 
via A7, i.e. Push and Pull, always results in a word being transferred to/from the 
System stack, the upper byte of which is null). 
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Figure 5.7 The 68000 System stack organized by the array-averaging subroutine. 


The coding shown in Table 15.91 is designed to reflect the 6809 equivalent, 
rather than using the more efficient features of the 68000, such as DBF. The 
LINK A6,#102h instruction in line 11 replaces the three equivalent 6809 instruc¬ 
tions in lines 11-13 of Table ©8l The old Frame Pointer (A6 in this example, but 
any Address register except A7 could be used) is firstly saved in the System stack. 
Then it is overwritten by the SSP to become the new Frame Pointer to TOF. Finally, 
the SSP is moved down to open the 102-byte Frame. The opposite UNLinK (UNLK) 
instruction of line 30 undoes these three actions also in one go. Table ©Ti b) lists 
the behavior of this pair of instructions. Note that LINK An ,#kk is a word op¬ 
eration, with kk being sign extended to a 32-bit constant and then added to SSP. 
Effectively this limits the frame size to 32,768 bytes. With relatively little mod¬ 
ification, the code given below could deal with sampled arrays of this size. The 
68020 MPU has a long LINK variant. 
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Table 5.9 Using a Frame to acquire temporary data; 68000 code. 
.processor m68000 


Burst acquires up to 256 analog samples and returns the sum 


ENTRY 

EXIT 

EXIT 


n is the number of samples on the stack 
The sum of i 8-bit samples in Data register D7 
AO, DO.W and CCR altered. A6 is used as the Frame Pointer 


9 


. defi ne 

A_D = 6000h ; 

Where the A/D converter lives 

10 

; First make the frame 




11 

000400 4E560102 ARRAY_AV: 

link 

a6,#102h ; 

Make 258-byte frame, A6 as FP 

12 

; Initialize to acquire 

data 



13 

000404 3D6E0008FFFE 

move.w 

8(a6),-2(a6); 

Copy i into frame 

14 

00040A 41EEFFFE 

1 ea 

-2(a6),a0 ; 

Point A0 to just above ARRAY[0] 

15 

; Burst sample 



16 

00040E 11386000 GET_L00P: 

move.b 

A_D,-(a0) ; 

Get data into frame & dec pntr 

17 

000412 536EFFFE 

subq 

#l,-2(a6) ; 

count = count - 1 

18 

000416 66F6 

bne 

GET_L00P ; 

and repeat 

19 

; Initialize to sum data 



20 

000418 3D6E0008FFFE 

move.w 

8(a6),-2(a6); 

Copy i back into frame again 

21 

00041E 41EEFFFE 

1 ea 

-2(a6),a0 ; 

A6 to just above ARRAY[0] anew 

22 

000422 4247 

cl r .w 

d7 ; 

Clear sum (D7) 

23 

000424 4240 

cl r ,w 

dO ; 

Use DO to extend byte to word 

24 

; Now do the summation 




25 

000426 1020 ADD_L00P: 

move.b 

-(aO),dO ; 

Extend ARRAY[n] to word, ptr-- 

26 

000428 DE40 

add .w 

dO,d7 ; 

Add to word sum 

27 

00042A 536EFFFE 

subq 

#1,-2(a6) ; 

count = count - 1 

28 

00042E 66F6 

bne 

ADD_L00P ; 

and repeat 

29 

; Close frame 




30 

000430 4E5E 

uni k 

a6 ; 

SSP back up and restore old FP 

31 

000432 4E75 

rts 

1 

and return 

32 


. end 




The core of the program is straightforward, with the only problem lying in 
lines 25 and 26. Here a byte sample is to be added to a word sum. As both source 
and destination operands must be the same size, the byte variable is promoted 
to word size by moving into previously cleared DO.W. This is then added to D7.W. 
In stepping an Address register through the array, AO fulfils the same role as the 
X Index register in the 6809 equivalent, leaving the Frame Pointer A6 untouched 
(lines 16 and 25). 

The 68000 family are blessed with a generous complement of registers. It 
would thus be more efficient to use a Data register to hold the loop counter 
rather than operate directly in memory. The C high-level language allows the 
programmer to declare local (known as Auto) variables as Register variables. The 
compiler will then make an attempt to lodge such variables in a register. 

The last two examples have returned their single parameter in a Data register. 
High-level languages such as Pascal and C permit only one return variable, which 
is defined as the value of the function. Thus expressions in C such as: 

if (block_copy(rom_start, ram_start, length) = -1) 

{do this, as no error has occurred;} 

el se 

{do that, on an error situation;} 
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are possible, where function block_copy() (see Table HTTl is called up (with 
the passed parameters indicated in brackets) and its value compared to -1. Its 
value' is in fact the returned value. 

In C and Pascal, larger numbers of variables can be altered by passing pointers 
(as in this example) or by declaring variables as global. Global variables are stored 
in fixed RAM locations, and are thus accessible to any function. 

The System stack itself may be used to pass back multiple variables. In such 
cases, room is normally left on this stack, just below the pass-to variables, be¬ 
fore moving control to the subroutine. On return, the SSP will then point to the 
returned parameters, which can be extracted before the stack is cleaned up. 
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CHAPTER 6 

Interrupts plus Traps equals 
Exceptions 


A microprocessor used as a controller spends much of its time detecting and 
measuring events happening in the outside world. These external events happen 
in their own time and are in no way synchronized to the MPU's internal processes. 
A simple example of this is shown in Fig. |6.1 1 where we wish to measure the time 
in 1 ms ' ticks’ between each cycle of an electrocardiograph (ECG or EKG) signal 
(heart wave). One possibility would be to use hardware to count 1 kHz oscillations 
and to detect the fiducial point [TJ; indeed this hardware could itself be a MPU- 
based circuit. When this reference point (the signal peak in the diagram) occurs, 
the master microprocessor must be alerted to the fact. A response must be made 
within 1 ms of the event, as the counter continues incrementing. 

One approach would be to use the peak detector's output to set a flag (latch). 
This latch output is buffered to the data bus, and can be accessed at some address. 
Thus the MPU could regularly read the flag at intervals of no less than 1 ms, and 
get the counter data only when the flag was set. Resetting the latch at this point 
prepares for the next event. However, in this example this will typically only 
happen around once per second, a 0.1% hit rate! This polling approach is fine if 
there are only a few events being measured and the background processing task 
is not too onerous. However, in this instance we may also be measuring blood 
pressure, temperature etc. for a whole ward of patients. In that case the MPU will 
spend most of its time polling, leaving little time for processing. 

To circumvent this problem, all MPUs have at least one input labelled Interrupt. 
When its Interrupt line is tugged (usually by going low or by a low-going edge) 
the MPU will temporarily suspend its operation and go to an interrupt service 
routine (ISR). This is just a subroutine entered via an external (hardware) signal. 
At the end of this routine, control is passed back to the background program. 
However, interrupts as seen from the MPU happen at random, so care must be 
taken that the machine state has not been disrupted when control does return. 
Furthermore, when several devices can request an interrupt, some means must be 
found to determine the source of the service request, and prioritize when more 
than one peripheral requires attention. 

All this refers to hardware-generated interrupts. Most MPUs can generate in¬ 
terrupts when some exceptional condition occurs internally, for example using a 
zero divisor for the DIVU and DIVS 68000 instructions. Allied to these traps are 
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explicit instructions which can cause the processor to act in much the same way 
as a hardware interrupt. These are sometimes known as software interrupts. A 
generic term for all hardware and software interrupts is an exception (for excep¬ 
tional circumstances). 

Processors handle exceptions in differing ways. In this chapter we will look 
at the general concepts involved in interrupt handling, and how the 6809 and 
68000 processors implement exceptions. 


6.1 Hardware Initiated Interrupts 

Although the minutia of the response to an interrupt request varies considerably 
from processor to processor, the following phases can usually be identified: 

1. Finishing the current instruction. 

2. Ignoring the request if the appropriate mask (if any) is set. 

3. Saving at least the state of the PC and CCR registers. 

4. Entering the appropriate service routine. 

5. Identifying the source of the interrupt (if not done in phase 4). 

6. Executing the defined task. 

7. Restoring the processor state and returning to the point in the program where 
control was first transferred. 

Interrupts are by definition asynchronous to system operation. Their apparent 
randomness means that the system response to such events must ensure that the 
interrupted program (the background program) is oblivious to the fact that the 
processor has ' gone away for a while’ to service an external request. In some ways 
this is akin to transparency in subroutines (see page ll26) but is more difficult to 
implement due to the erratic nature of the action. 

At the very least, transparency to interrupts demands that the state of the 
MPU must be saved before going to the interrupt service routine, and restored on 
exit. This implies that instructions be treated as indivisible, as saving the MPU 
state part of the way through an instruction is difficult and to my knowledge is 
not implemented by any current MPU. Thus, although an interrupt request signal 
may be internally latched by the MPU at any time, usually on a clock edge, it will 
not be examined until the end of the current instruction execution. 

As a consequence of this, care must be taken when dealing with data objects 
greater than the natural size of the processor. As an example, consider incre¬ 
menting a 4-byte variable N in 6809 code. Assuming that this is stored in mem 
to mem+3 we have: 


1 

LDD 

mem+2 

Add one to lower word 


2 

ADDD 

#1 

stored in mem+2:mem+3 


3 

STD 

mem+2 

Lower word now incremented 


4 

LDD 

mem 

Add carry to upper word stored 

in mem:mem+l 

5 

ADCB 

#0 

one byte at a time 


6 

ADCA 

#0 

as there is no Double Add with 

Carry 
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7 STD mem ; N++ at last! 

This is simple enough. But consider that N = FFFF FFFF/r. If an interrupt strikes 
in-between lines 2 and 7, and the interrupt service routine uses N, then the value it 
will see is FFFF 0000 h rather than 0000 0000h. Although problems like this can 
be avoided at assembly level, they are difficult to overcome when using high-level 
languages, as the machine-level code produced by the compiler is not directly 
under the control of the programmer. This is particularly true as high-level in¬ 
structions are not entities as seen by an interrupt. In general do not share data 
between interrupt service routines and other code, see Section 10.2. Flowever, 
avoiding the use of global variables is easier said than done. 

Most interrupts can be inhibited during sensitive moments', such as de¬ 
scribed above, by setting the appropriate mask in the Code Condition register. 
Specifically the 6809 MPU supports three interrupt lines. These are labelled in 
Fig. l6.21 a) as IRQ (for lnterrupt_ReQuest), FIRQ (for Fast_lnterrupt_ReQuest) and 
NMI (for Non_Maskable_lnterrupt). The former two are inhibited by mask bits I 
and F respectively. These are automatically set when the MPU is Reset, so that pe¬ 
ripheral interface devices and relevant variables can be allocated their initial state 
before dealing with an interrupt. The ANDCC instruction can be used at any point 
in the program to clear either or both mask bits, for example ANDCC #10101111b 
enables both IRQ and FIRQ lines. Conversely the 0RCC instruction can be used to 
inhibit, for example 0RCC #01000000b disables FIRQ. 

The 6809 has one non-maskable interrupt line. This cannot be locked out, and 
as such must be used with caution. Unlike IRQ and FIRQ which are activated by 

a low voltage level _at the appropriate pin, NMI is triggered by a low-going 

voltage \_ that is edge triggered. This voltage may stay low after the event, 

and will not cause another interrupt until the signal goes high and then low again. 
In the event of one type of interrupt being interrupted by another, the N MI will 
have top priority, that is N MI can interrupt an IRQ or FIRQ service routine, or even 
itself. IRQ has the lowest priority, and can be interrupted by a FIRQ, as well as 
NMI. As we shall see, the interrupt handling mechanism requires the use of the 
System stack. After the 6809 is Reset a NMI interrupt event is latched, but not 
acted upon, until the first load into the System Stack Pointer, which it is assumed 
sets up the System stack, for instance LDS #0400h. 

The interrupt structure of the 68000 MPU as shown in Fig. 16.21 b) is some¬ 
what more complex. Here too there are three interrupt lines, and in a mini mum 
system these can be used to give three different responses. However, the pro¬ 
cessor is actually designed to differentiate between seven different interrupt re¬ 
quests, which it interprets from the 3-bit pattern on the Interrupt Priority Level 
IPL2 IPL1 IPL0. Thus 1 00b (active low 01 1 b ) is considered a level 3 interrupt re¬ 
quest. A level 0 request (IPL2 IPL1 IPL0 = 111) is ignored (no interrupt), whilst 
level 7 is non-maskable, and like the 6809's NMI equivalent, is edge triggered, an 
edge here being defined as a transition from a lower level. 

The mask structure also echoes the level-oriented interrupt request. Three 
mask bits in the Status register (see Fig. 13. H set the level above which a request 


























HARDWARE INITIATED INTERRUPTS 147 


is honored. Thus if 12II10 is set at 100 (e.g. ANDI #lllll | 100 | llllllllb, SR) 
then any request from level 5 to 7 will result in the relevant internal IRQ line being 
activated. On Reset the three interrupt mask bits are set to 1 11, locking out all 
except level 7, the non-maskable interrupt. 
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Figure 6.3 Using a priority encoder to compress 7 lines to 3-line code. 

Interrupt request lines from three peripheral interfaces may be directly con¬ 
nected to IPL2 IPL1 IPL0, having level 1, 2 or 4 priorities. Up to seven interrupt 
sources can be handled using external circuitry to encode these lines to 3-bit 
binary. The most common approach shown in Fig. 16. 3 1 uses a 74LS148 priority 
encoder 0. This has eight active-low inputs and three active-low outputs. The 
74LS148 gives a 3-bit coded equivalent of the highest active input line. Thus if 
devices 6 and i simultaneously request service (1 01111 01 it), then the output 
will be 6 (001 b, active-low). Once device 6 has been serviced and its interrupt 
request line lifted, the 74LSF48's output wifi change to 1 1 Oh (active low 1), and 
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device 1 will then be eligible for service (if not masked out by 12II10). Similar 
considerations apply to the 68008 MPU, although as we can see from Fig. l3.2I IPL0 
and IPL2 are internally connected, effectively allowing only levels 2 (707), 5 (01 0) 
and 7 (000) to be acceded to. The higher the level of interrupt request, the greater 
is its priority. Thus if a level 5 interrupt is in progress, it can only be interrupted 
by a level 6 or 7 request. 

Once the MPU accepts an interrupt, it must change from executing the back¬ 
ground program, and move to the appropriate interrupt service routine or fore¬ 
ground program. This is si mi lar to switching to a subroutine, but the change¬ 
over is dictated by an apparently random call from outside. As this can happen 
anywhere in the background program, the state of all the MPU's registers (its 
context) used in the background program must be saved before the change-over. 
On return these are restored, leaving the state of the MPU unchanged. Making 
the interrupt process invisible in this manner allows the MPU apparently to ex¬ 
ecute more than one task in parallel. Multitasking in this manner is of course 
a serial process, and carries the overhead of the time to switch context between 
background and foreground 0. 

There are two approaches to context switching. At the very least the Program 
Counter and Code Condition register/Status register must be saved. The former, 
so that control can be passed back to the background program at the point of 
the break, as in the case of a subroutine call. The latter, because the CCR will be 
altered by any but the most trivial interrupt service routine. Any additional regis¬ 
ters altered by the service routine can be saved by Pushing and Pulling via a stack, 
in the manner shown in Table [5731 Some early microprocessors, such as the 6800, 
save all internal registers automatically on the System stack when an interrupt 
response is initiated and return them at the end. This entire-state context switch¬ 
ing is convenient, but in processors with a significant complement of registers, 
the resulting time overhead can have a noticeable impact on system response. 
This is not justified where only a few registers are actually used in the service 
routine. Early processors have few registers and/or stack-oriented instructions 
(the 6800 has one Address register, two Data registers and cannot directly Push 
or Pull the former), and thus an automatic whole-state context switch is efficient. 
Both types of context switching use the System stack to save the register states. 

The 6809 MPU has both partial and full context switching. The IRQ and NMI 
responses automatically cause all registers to be Pushed on to the System stack, 
in the order shown in Fig. 16.41 b). The FIRQ response saves only the PC and CCR, 
leaving the rest up to the programmer (see Table l6?IT b)). The E flag in the CCR is 
set after the Push if the Entire state has been saved. It is used by the ReTurn from 
Interrupt instruction, which terminates all 6809 interrupt service routines. RTI 
reverses the context switch and restores the MPU to its original state. 

The FIRQ response automatically sets the I and F mask bits in the CCR before 
entering its service routine, in order to ensure that it cannot be further inter¬ 
rupted by any other than the non-maskable interrupt. Only the I mask is set in 
the IRQ response. Consequently an IRQ service routine can be interrupted by 
a FIRQ response as well as a NMI. Of course when the old value of the CCR is 
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Figure [6~4l How the 6809 responds to an interrupt request (continued next page). 


(a) Simplified Interrupt processing flowchart. 
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FFF4;5 
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(b) Context switching, saving the state on the System stack. 


(c) Vector table. 


Figure 6.4 (continued) How the 6809 responds to an interrupt request. 
returned, these changes vanish. 

When a 68000 processor recognizes a level-n request, it saves the SR in a 
temporary internal register. Then the three interrupt mask bits are updated to 
level n, permitting only interrupts at a higher priority level to be further recog¬ 
nized during the level-n service routine. Also the T flag is cleared, to prevent 
Trace interrupts (see page 1 1641 . and the S flag is set. The latter means that the 
processor switches into the Supervisor state (if not already there). Thus when 
the PC and SR are saved, as shown in Fig. l6.Sl b). the Supervisor Stack Pointer (SSP) 
and not the User Stack Pointer (USP) is used to delineate the context stack. The 
SR saved in this manner is the original copied into the internal register and not 
the modified version. Thus the interrupt service termination Return from Ex¬ 
emption (RTE) (equivalent to the 6809's RTI) will move the processor back to the 
User state, if this was the interrupted state, as well as restoring the mask bits to 
their original value. 

With everything put away on the System stack, the processor is ready to go to 
the start of the appropriate service routine. The simplest approach to this is to 
have the entry addresses stored in predetermined locations. The 6809 MPU re¬ 
serves 14 bytes at the top of its memory space to hold the seven start addresses of 
its three hardware, three software and one Reset interrupt, as shown in Fig. l6.41 c). 
For example, when the MPU responds to an IRQ request, it will find the start of 
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Fiqure [6~5l How the 68000 responds to an interrupt request (continued next page). 
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Vector 

number - 1 - 

0 - ■ Startup value for SSP - 

1 - ■ Startup value for PC - 

2 - ■ Bus error 

- 1 - 

3 - ■ Address error 

-1- 

4-- Illegal instruction 

5 - ■ Divide by zero 

- 1 - 

6 --Outside bounds for CHK- 

-1- 

7 - ■ TRAPV instruction 

-1- 

8- - Privilege violation 

-1- 

9- * Trace exception 

10- - Line 1010 emulator ■ 

-1- 

11- ■ Line 1111 emulator ■ 

-1- 

12- ■ Unassigned (reserved) - 

13- - Unassigned (reserved) - 

14- - Unassigned (reserved) ■ 

15- - Unitialized Interrupt 


Address 

0000 N 


Vector 

number 


Reset 

FC=110 


• Trap 

#0 

• Trap 

#1 

■ Trap 

#2 

• Trap 

#3 

Trap 

#4 

• Trap 

#5 

• Trap 

#6 

• Trap 

#7 

• Trap 

#8 

• Trap 

#9 

• Trap 

#10 

• Trap 

#11 

• Trap 

#12 

• Trap 

#13 

• Trap 

#14 

• Trap 

#15 


Address 

0080 \ 


0094 CO 

c 

o 


Eight unassigned 
1 (reserved) vectors 


£ 7 * 16 unassigned 

48-03 (reserved) vectors 


24* * Spurious interrupt 
- 1 -- 

25* ■ Level 1 int autovector 
- 1 -- 

26* ■ Level2 int autovector 
-1-- 

27- - Level3 int autovector 

-1- 

28- • Level4 int autovector 

-1-- 

29- • Level5 int autovector 

- 1 -- 

30- • Level6 int autovector 

- 1 -- 

31- - Level7 int autovector 


64- 

* External 

int 

vector 

64 

65- 

- External 

int 

vector 

65 

66- 

• External 

int 

vector 

66 

67- 

- External 

int 

vector 

67 

68- 

• External 

int 

vector 

68 


- External 

int 

vector 

251- 

.03EC 

•External 

int 

vector 

252- 

03F0 

•External 

int 

vector 

253- 

03F4 

•External 

int 

vector 

254- 

03F8 

-External 

int 

vector 

255- 

03FC 

0400 


(c) Exception table (unassigned may be used for 68020+ processors). 


Figure 6.5 (continued# How the 68000 responds to an interrupt request. 


external interrupting devices (Software interrupts) 
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the IRQ service routine in FFF8:9/i. Normally this vector table is in ROM, and this 
is a necessity for the Reset vector in FFFE:F/i, as the address for the main routine 
must be present at power up (cold start). In systems where no actual memory 
exists at these locations, the Address decoder must be designed to enable phys¬ 
ical memory when these addresses are output by the MPU. If necessary, clever 
address decoding can be used to place locations FFF2 - FFFD/i in RAM where they 
may be dynamically altered by the program, although this is rare. 

As an example, consider an extension to the system shown in Fig. 16.11 An 
external 16-bit counter records 1 ms ticks, whilst a detector circuit records signal 
peaks. An array of 256 peak to peak times in mi lliseconds is to be displayed on 
an oscilloscope. Two digital to analog converters are to be used to drive the X 
and Y oscilloscope plates — see Fig. 111.31 The background program is to scan 
this array sending its analog equivalent to the Y plates at the same time as the 
X plates drive is being incremented from 0 to 255 (0 to full-scale analog). This 
occurs as a continuous loop, giving a flicker-free display. Whenever a peak is 
detected, the processor is to switch from its background display task to updating 
the array with the latest period. When the array is full (256 peaks), the process 
is to be repeated, over-writing the oldest values. Provided that this foreground 
task is accomplished quickly, this switch back and forth will not be noticed on 
the display. 

We need not concern ourselves with the details of the Address decoder nor the 
interfacing digital to analog converters here, but we must consider the problem 
of driving the MPU's interrupt input from the peak detector. Taking the 6809 MPU 
for our first solution, we will use the FIRQ input to keep the response time short. 
Now FIRQ (and IRQ) are active as long as their level is low. We have not specified 
the duration of the peak detector's active output, but in this situation it is likely 
to be anything up to 250ms, to avoid multiple triggering due to noise around 
the peak. Thus if FIRQ is still low after the return to the background program, 
then another interrupt response will be immediately set in train. In this case the 
whole 256-word array will probably be updated in one go! 

As shown in Fig. 16.61 interposing a D flip flop solves the problem. As the flip 
flop is edge-triggered, its D input is only clocked in on the falling edge (in this 
case). This interrupt flag is thus lowered'. After the processor vectors to the 
service routine, the act of reading the counter also activates the flip flop's Preset 
input, which sets it to logic I (raises the flag). Thus on return, the interrupt line 
is no longer active, irrespective of the indeterminate length of the source request. 
Edge-triggered interrupts, such as NMI, can be directly driven without using an 
external flag. 

Peripheral devices designed specifically to interface to a MPU normally incor¬ 
porate such flags as part of a Status or Control register. For example the 6821 PIA 
of Fig. ll.9l uses bits 6 and 7 of each Control register for this purpose ID- Reading 
the appropriate Data register clears these flags automatically. 

The 6809 code implementing our specification is shown in Table 16.11 This 
comprises three separate source modules: 
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6809/68000 
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Figure 6.6 Using an external interrupt flag to drive a level-sensitive interrupt line. 

1. The background module DISPLAY which extracts the 256 array values using 
them to drive the oscilloscope Y plates as it ramps up the X plates. This module 
runs continually except when interrupted by the foreground module. 

2. The foreground module UPDATE is entered only when an external event occurs. 
It reads the counter, evaluates the time since the last event, inserts the outcome 
in the array and moves the array index on one. 

3. The VECTOR module simply sets up the Interrupt and Reset vectors. The ac¬ 
tual values are put into memory at load time, that is when the EPROM is pro¬ 
grammed (or program downloaded into RAM in a Microprocessor Development 
System). It does not execute as such at run time, it is simply in situ in a sup¬ 
porting role to the two previous modules. 

Each of these three modules are separately assembled and subsequently linked 
together to give the listing of Table [tTT| We will discuss this linkage process in 
the following chapter, here it is sufficient to note that the assembler reserves 
256 words in its Data program space . psect _data (line 17 of Table RjTR a)), the 
start address of which is called ARRAY. This name is made known to the other sep¬ 
arately assembled modules through the linker by declaring it . publ i c in line 16. 
The foreground module needs to use this address, its value at assembly time be¬ 
ing unknown, and it gets round this problem by declaring ARRAY as . external in 
line 20 of Table RTTl b). This directive is really saying to the assembler ' hold your 
fire, the actual address will be supplied at a later date via the linker’. Of course 
this is an array of words, as the counter is 16-bits wide. In a similar manner, the 
address of both run-time modules are made known to module VECTOR by declar- 
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ing their start address .public. They are consequently declared .external in 
line 9 of Table l6JT c). 

In the background module, the ramp count x (the Scan pointer) is also used 
as array index (i.e. at position x display ARRAY [x]). However, as each array ele¬ 
ment is a double byte, x is first promoted to 16 bits (line 24) and then multiplied 
by two (lines 25 and 26). The resulting value in Accumulator_D is then used as 
the offset to the X Index register — which is pointing to ARRAY[0] — to abstract 
ARRAY [x]. The same mechanism is used in the update interrupt module to con¬ 
vert the Update pointer i to the array pointer i (lines 29 - 31). Both the Scan and 
Update pointers conveniently wrap around from 255 to zero after incrementing. 
Numbers other than 255 would need to be actively zeroed. 

Notice how in module VECTOR (lines 10 - 12) the start addresses for the Reset 
and FIRQ service routines are located in their appropriate place. In practice other 
vector addresses, not used in this fragment, would be defined here. 

The 6809 MPU can only deal directly with three interrupt requests from sepa¬ 
rate sources. Some applications require many more than can be handled in this 


1 

2 
S 

4 

5 

6 

7 

8 
9 

10 

11 

12 

IS 

14 


Table f6~Tt 6809 code displaying heart rate on an oscilloscope (continued next page). 

.processor m6809 


Background program which scans array of word data (ECG periods) 

Sends out to oscilloscope Y plates in sequence 

At same time incrementing X plates 

so that ARRAY[0] is seen at the left of screen 

and ARRAY[255] at the right of screen 

ENTRY : None 

EXIT : Endless loop 


.define DAC_X=6000h, 
DAC_Y=6001h 


8-bit X-axis D/A converter 
12-bit Y-axis D/A converter 


15 




.psect 

data 

Data space 

16 




.publi 

c ARRAY 

Make the array global 

17 

0000 


ARRAY: 

.word 

[256] 

Reserve 256 words for the array 

18 

1 Q 

0200 


X_C00RD: 

. byte 

[1] 

and a byte for the X co-ordinate 

20 

J 



.psect 

_text 

Program space 

21 




.publi 

c DISPLAY 

Make program known to the linker 

22 

E000 

10CE080C 

DISPLAY: 

1 ds 

#0800h 

Define Top Of Stack 

23 

E004 

F60200 

DL00P: 

ldb 

X_C00RD 

Get X co-ordinate 

24 

E007 

4F 


cl ra 


Expand to word size 

25 

E008 

58 


1 si b 


Multiply by two 

26 

E009 

49 


rol a 


to give array index in Acc.D 

27 

E00A 

8EOOOO 


1 dx 

#ARRAY 

Point to ARRAY[0] 

28 

E00D 

308B 


1 eax 

d, x 

now to ARRAY[X] 

29 

E00F 

F60200 


ldb 

X_C00RD 

Get back X co-ordinate 

30 

E012 

F76000 


stb 

DAC_X 

Send it out to X plates 

31 

E015 

EC84 


ldd 

0, x 

Get ARRAY[X] word 

32 

E017 

FD6001 


std 

DAC Y 

and send it to the Y plates 

33 

E01A 

7C0200 


i nc 

X_C00RD 

Go one on in X direction 

34 

E01D 

20E5 


bra 

DL00P 

and show next sample 

35 




. end 




(a) The background array-display module. 
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Table 6.1 (continued,) 6809 code displaying heart rate on an oscilloscope. 

1 .processor m6809 

2 ' *********“*-*********-*********-**-*********-************-*** 

S 

4 

5 

6 

7 

8 
9 


10 

11 

* 


.define COUNTER =9000h, ; The 16-bit period Counter 

12 

IS 

14 




INT_FLAG=9800h ; The external Interrupt flag 

* 


.psect _data 

Data space 

15 

0201 

U PDATE_I: 

. byte 

[1] 

Space for the array update index 

16 

17 

18 

0202 

LAST_TIME: 

.word 

[1] 

and for the last counter reading 

J 


.psect text 

Program space 

19 



.public UPDATE 

Make routine known to the linker 

20 



.external ARRAY 

Get ARRAY from another module 

21 

E01F 

3436 UPDATE: 

pshs 

a, b, x, y 

For FIRQ save used registers 

22 

E021 

7F9800 

cl r 

INT_FLAG 

Reset external Interrupt flag 

23 

E024 

FC9000 

ldd 

COUNTER 

and get the count from outside 

24 

E027 

1F02 

tfr 

d,y 

Put in Y register for safekeeping 

25 

E029 

B30202 

subd 

LAST_TIME 

Sub frm last cnt gives new period 

26 

E02C 

10BF0202 

sty 

LAST_TIME 

and update last counter reading 

27 

E030 

1F02 

tfr 

d,y 

Y now holds the new period 

28 

E032 

F60201 

ldb 

U PDATE_I 

Get the update array index 

29 

E035 

4F 

cl ra 


Expand to word 

30 

E036 

58 

1 si b 


Multiply by 2 to cope with 

31 

E037 

49 

rol a 


the word nature of ARRAY[] 

32 

E038 

8EOOOO 

1 dx 

#ARRAY 

Point to ARRAY[0] 

33 

E03B 

10AF8B 

sty 

d, x 

Put new value (in Y) in ARRAY[I] 

34 

E03E 

7C0201 

i nc 

U PDATE_I 

Move update marker on one 

35 

E041 

3536 

pul s 

a, b, x, y 

Return machine state 

36 

E043 

3B 

rti 



37 



. end 




' Interrupt service routine to update one array element * 
•' with the latest ECG period, as signalled by the peak detector * 
■' ENTRY : Via a FIRQ interrupt * 
•' ENTRY : Location of ARRAY[0] is globally known through the linker * 
■' EXIT : ARRAY[i] updated, where i is a local index * 
•' EXIT : MPU state unchanged * 


(b) The foreground interrupt service routine updating the array. 




.processor m6809 






Sets up Interrupt and Reset vector at top of ROM 
using globally known labels through the linker 




S 

4 

5 

6 

7 

8 
9 

10 E7F6 E01F VECTOR 

11 E7F8 

12 E7FE E000 RESET: 
IS 


k * * * 


.psect _text 


. publ i c 

.external 

.word 

.word 

.word 

. end 


VECTOR 

UPDATE,DISPLAY 

UPDATE 

[ 3 ] 

DISPLAY 


Make this routine known globally 
These will be got thru the linker 
Addr of the FIRQ service routine 
Skip IRQ, SWI, NMI not used here 
Go to DISPLAY routine on Reset 


(c) The Vector table. 


way. Wiring these n request lines through open-collector gates is a convenient 
way of channelling n lines to one interrupt line, see right side of Fig. 16.71 Nor¬ 
mally the n service request lines are high and the open-collector gates are off, 













HARDWARE INITIATED INTERRUPTS 1 57 


letting IRQ rise through the pull-up resistor to +V. If one or more request lines 
go low then IRQ goes low. MPU-compatible peripheral interface devices, such as 
the 6821 and 68230 PIAs, have integral open-collector buffers at their interrupt 
output lines. 

Given that the MPU has gone to the service routine, how is it to distinguish 
between the various possible sources? A simple procedure is to examine each in¬ 
terrupt flag in turn, until the source is found. Where MPU-compatible peripherals 
are used, this is accomplished by examining the relevant bits in the appropriate 
peripheral Control/Status register. 

Polling in this manner is rather slow but does have the advantage of simplicity, 
and a priority scheme of arbitrary complexity can be implemented in software. 

There are many schemes which speed up the process of distinguishing be¬ 
tween interrupting peripherals CD, one of which is shown in Fig. 16.71 Here, four 
events (e.g. peak detectors) trigger interrupt flags in the manner of Fig. 16.61 These 
four service requests are combined together with open-collector buffers to drive 
the MPU's IRQ line. The state of these four lines can be read at any time through 
3-state buffers at address Vector. Assuming that unconnected data lines read 
as logic 0, we have: 


Request 

0 

1 

2 

3 


Vector 


00000100 (4) 

00001000 (8) 
00010000 (16) 
00100000 (32) 


If more than one request is simultaneously received, intermediate vector values 
will be generated. The appropriate software filtering routine can then separate 
and prioritize the requests, or a priority encoder can be used as a hardware so¬ 
lution. The MPU can then go to the appropriate routine. 

As an extension to this scheme, the vector buffers could be enabled when¬ 
ever the addresses FFF9:Ah are detected on the address bus with the Status bits 
BA BS = 01, rather than the ROM. Thus the address of the program start is di¬ 
rectly generated as a response to the interrupt, but appears to originate at the 
appropriate vector address. In this situation it would be better to read the Service 
Request lines through a priority encoder to remove ambiguities caused by more 
than one peripheral requesting service at the same time 0. Direct vectoring by 
device is the fastest technique available, but is expensive in hardware. 

The 68000 family also makes use of a Vector table to service its various excep¬ 
tional events, but in a rather more flexible manner. The lowest 256 long-words of 
memory, 000000 - 0003 FFh, hold addresses potentially pointing to the beginning 
of 255 service routines as shown in Fig. |6.51 Of these, the bottom two long-words 
are reserved for the critical Reset vector thus: 



Double long-word Reset vector 




Local Address Decoder 



Figure 6.7 Sen’icing four peripherals with one interrupt. 
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When the 68000 MPU is Reset, the initial setting of the Supervisor Stack Pointer 
(not the User Stack Pointer) is fetched from long-word 0 (000000 - 000003b), 
followed by the start value of the Program Counter in long-word 1 (000004- 
00000 7h). This dual vector must be in ROM to ensure a successful cold start 
(i.e. from power up), as must be the equivalent 6809 Reset vector at the top 
of memory. The remaining 254 vectors are normally also located in ROM, but 
clever address decoding can be used to overlay these vectors in RAM. This latter 
procedure allows the software dynamically to relocate exception service routines. 
The external decoder can distinguish between vectors 0 and 1, and 2 to 255 
from the state of the Function Code status pins, which are 1 1 0b for the former 
(Supervisor Program) and 1 01 b for the latter (Supervisor Data) — see page 1691 
As the Supervisor Stack Pointer is set up after the MPU leaves its Reset start-up, 
interrupts can be immediately serviced. The Interrupt Mask bits in the Status 
register are set to 111 b, locking out all but level 7 interrupts (i.e. non-maskable). 


Service Request 


68xxx Pe ripheral In terface 
Service Request 


IACK > 
IRQ 0 - 


AS>- 

LDS>- 
R/W- 


DTACK >- 
dO—d 7 H 


IACK Decoder 


1 ack 7 _ 
1 ACK 6 _ 
IACK 5 _ 
1 ACK 4 _ 
1 ACK 3 _ 
1 ACK 2 


IACK 1 


(see Fig. 6.6) 
Inte rrupt flag 


CY7 
C Y6 
CY 5 
C Y 4 

C Y3 
C Y2 


C Y1 


74LS138 


O 


Pri ority Enco der 

1 (see Fig. 6.3) 


Level 1 


Level2 


-C i 
-C 2 
-c 3 

-c «■ 

-C 5 
-C 6 

-c 7 




A> 


74LS148 


68000 MPU 

-/ 



—a 

Enabled on 

o-— 

i 

'C =111 

-c 


C VPA 


al 

a 2 

a 3 

FC 2 

FC 1 

FCO 


C IPL 2 
C IPL 1 


c IPL 0 


-c DTACK 
H dO—d 7 


Figure 6.8 External interrupt hardware for the 68000 MPU. 

When a 68000 MPU receives an interrupt request of a higher priority than 
its mask setting, it commences an Interrupt Acknowledge read cycle 0. The 
level is echoed on Address lines ai a 2 a 3 , with all other address lines going high. 
The Function Code lines FC2 FC1 FCO are set to 111 and a normal Read cycle is 
implemented. Depending on external hardware, two things can happen. If the in¬ 
terrupting device wishes to use the fixed internal autovector table it responds by 
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bringing VPA low during this Read cycle. More sophisticated peripheral interface 
devices specifically designed for the 68000 MPU can respond by putting a Vector 
number on its data bus and activating DTACK in the normal asynchronous way 
(see Fig. l3.6t . The MPU multiplies this number by four (shift left twice) giving the 
address of the user interrupt vector somewhere in the table. 

Referring to Fig. 16.81 we see that in both cases a 3 to 8-line decoder generates 
one of seven Interrupt Acknowledge signals lACKn from the 3-bit level address. 
This decoder is only active when the Function Code is 1 1 1, that is Interrupt Ac¬ 
knowledge. The rest of the address lines are logic 1 and the general address 
decoding must ensure that nothing else responds to this situation. The level, 
and hence which IACK line is active, is determined by the connection of the pe¬ 
ripheral's service request to a 74LS148 Priority encoder, as described in Fig. 16.31 

First we look at a dumb interface, such as shown in Figs |6.1| and |6.6| which 
cannot generate its own Vector number. In Fig. 16.81 the level-1 request itself and 
acknowledgement (IACK1) are ANDed to drive VPA low. The MPU will go auto¬ 
matically to vector 25 (000064 - 7h) for its level-1 service routine. As previously 


Table [6~2l 68000 code displaying heart rate on an oscilloscope (continued next page). 


.processor m68000 


3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 


Background program which scans array of word data (ECG points) 

Sends out to oscilloscope Y plates in sequence 

At same time incrementing X plates 

so that ARRAY[0] is seen at the left of screen 

and ARRAY[255] at the right of screen 

ENTRY : None 

EXIT : Endless loop 


.define DAC_X=6000h,; 8-bit X-axis D/A converter 

DAC_Y=6001h ; 12-bit Y -axis D/A converter 


00E000 ARRAY: 

00E200 X_C00RD: 


000400 4240 DISPLAY: 

000402 1039 DL00P: 

0OOOE2OO 
000408 E348 
00040A 207COOOOEOOO 
000410 31F000006001 
000416 31F90000E2006000 
00041E 52390000E200 
000424 60DC 


.psect _data 
.public ARRAY 
.word [256] 

.byte [1] 

.psect _text 
.public DISPLAY 
clr.w dO 
move.b X_C00RD,dO 


Data space 

Make the array global 

Reserve 256 words for the array 

and a byte for the X co-ordinate 

Program space 

This program known to the linker 
Get X co-ordinate byte 
expanded to word 


1 si .w 
movea.1 
move.w 
move.w 
addq. b 
bra 
. end 


#l,d0 ; x2 to give array index in D0.W 

#ARRAY,aO ; Point A0 to ARRAY[0] 

0(a0,dO.w),DAC_Y ; Get ARRAY[x] to Y plates 
X_C00RD,DAC_X ; Send X coord to X plates 

#1,X_C00RD ; Go one on in X direction 

DL00P ; and show next sample 


(a) The background array-display module. 
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1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 


Table 6.2 (continued! 68000 code displaying heart rate on an oscilloscope. 
.processor m68000 


Interrupt service routine to update one array element 

with the latest ECC period, as signalled by the peak detector 

ENTRY : Via a Level 1 interrupt 

ENTRY : Location of ARRAY[0] is globally known through the linker 
EXIT : ARRAY[i] updated, where i is a local index 
EXIT : MPU state unchanged 


.define COUNTER=9000h, 


12 



INT_ 

_FLAG=9800h 

13 

; 




14 



.psect 

data 

15 

00E201 

UPDATE_I: 

. byte 

[1] 

16 

00E202 

LAST_TIME: 

.word 

[1] 

17 

; 




18 



.psect 

text 

19 



.pub!ic 

UPDATE 


20 

21 000426 

22 00042A 

23 000430 

24 000436 

25 000438 

26 00043E 

27 000444 

28 000446 

29 00044C 

30 00044E 

31 000454 

32 000458 

33 00045E 

34 000462 

35 


48E7C080 UPDATE: 
427900009800 
303900009000 
3200 

92790000E202 

33COOOOOE202 

4240 

30390000E201 

E348 

207COOOOEOOO 

31810000 

52790000E201 

4CDF0103 

4E73 


.external ARRAY 
movem.l dO/dl/aO 


cl r 

move.w 
move.w 
sub.w 
move.w 
cl r.w 
move.w 
1 si . w 
movea.1 
move.w 
addq .w 
movem.1 
rte 
. end 


The 16-bit period Counter 
The external Interrupt flag 

Data space 

Space for the array update index 
and for the last counter reading 


Program space 

This routine known to the linker 
Get ARRAY from another module 
(sp); Save used registers 

Reset external Interrupt flag 
& get the count from the counter 
Put in D0.W for safekeeping 
Sub from last cnt for new period 
and update last counter reading 
Prepare to get update array index 
expanded to word size 
x2 to cope with word ARRAY[] 

Point A0.L to ARRAY[0] 
dl,0(a0,d0.w); New value (Dl.W) to ARRAY[I] 
#1,UPDATE_I; Move update marker on one 
(sp)+,d0/dl/a0; Return machine state 


INT_FLAG 
COUNTER,dO 
d0,dl 
LAST_TIME,dl 
dO,LAST_TIME 
dO 

UPDATE_I,dO 
#l,d0 
#ARRAY,aO 


(b) The foreground interrupt service routine updating the array. 

.processor m68000 


1 
2 

3 

4 

5 

6 

7 

8 
9 

10 000000 

11 000004 

12 000008 

13 000064 

14 


Sets up interrupt and reset vectors at bottom of ROM 
using globally known labels through the linker 




.psect _text 
.public VECTOR 
.external UPDATE,DISPLAY 
OOOOFOOO VECTOR: SSP:.double OFOOOh 
00000400 PC: .double DISPLAY 

.double [23] 

00000426 LEVEL1: .double UPDATE 

. end 


Make this routine known globally 
These will be got through the linker 
Initial value of the System Stack 
Go to DISPLAY routine on Reset 
Other vectors not used here 
Addr of Level-1 IRQ serv routine 


(c) The Vector table. 


described, the Interrupt flag must be lifted at this time. 

Smart interfaces, such as the 68230 PI/T, are interfaced to the MPU in the 
normal way, see Fig. l3.13l Their IACK input is driven by the appropriate IACK 
decoder line, and the vector number put on the data bus during the concurrent 
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Read cycle. This vector number is programmed into the appropriate interface 
register (the Port Interrupt Vector register in the 68230 0) during the setup rou¬ 
tine. If we wanted to vector via address 0001 OOh, the programmed-in vector 
number would be 40 h (0001 00 -p 4 h). Vector numbers 0-63 should not be used, 
although there is nothing physically to prevent this. Should a 68xxx peripheral 
interface not have its vector register set up when an interrupt occurs, a default 
vector i5 will be sent to indicate Uninitialized Interrupt. 

The software for our example is given in Table [672] It matches the listing of 
Table l6d~l for the 6809 MPU, and the comments made there apply equally. Notice 
that the Interrupt Service routine UPDATE is terminated by RTE, the 68000 equiva¬ 
lent for RTI. I have assumed that a level-f autovector is being used as a pointer to 
the service routine. A simple change of operand in line 12 of Tablc RT721 'c) would 
move the start address to any other appropriate vector number. 

Vector 24 is described in Fig. |6.5f c) as a Spurious Interrupt. This startup ad¬ 
dress will be used if external circuitry asserts the Bus_Error (BERR) pin during an 
Interrupt Acknowledge Read cycle. The hardware designer may wish to do this 
when DTACK (or VPA) is not activated within a fixed time after the start of this 
cycle; to indicate a hardware problem. Such circuitry is frequently implemented 
as a retriggerable monostable which ' collapses' if not clocked frequently enough. 
Such a watch-dog timer can of course be used to indicate trouble out there' during 
a normal (i.e. not Interrupt Acknowledge) cycle. In such cases the MPU returns to 
the Supervisor state and enters the Bus Error exception service routine pointed to 
by Vector 2. Should the BERR signal persist when the status is being pushed out 
to the Supervisor stack on entry to the service routine, a catastrophic situation is 
assumed to have occurred. Such a Double-Bus fault causes the MPU to stop, with 
both Halt and Reset going low. This response will occur in general where a prob¬ 
lem occurs when an exception (including a Reset) tries to Push out its registers, 
for example when the Supervisor Stack Pointer is odd. 

Another possibility is to assert BERR and Halt simultaneously. Then the failed 
bus cycle will be rerun, with the hope that a spurious failure occurred (perhaps 
due to noise) and that the situation can be redeemed m. 


6.2 Interrupts in Software 

Interrupts occur when something outside requests assistance. The MPU responds 
by saving all or part of its internal state on the System stack and going to a 
service routine via a table of addresses. It is also possible to initiate a s imi lar 
response by internal software means, either deliberately or via a dubious event, 
such as having a zero division for the DIVU/DIVS instruction. Software initiated 
exceptions are commonly known as Software Interrupts or Traps. In this section 
we will briefly consider these operations and other instructions associated with 
Exceptional operations. 

Interrupt service routines are normal subroutines but terminated with an in¬ 
struction or instructions that restore the state saved when responding to the 
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Table 6.3 Exception related instructions. 


Operation 

Mnemonic 

Description 

ReTurn 

from Interrupt 

RTI 

Switch back to background 

Pulls context back from System stack 

Synchronize 

Clear and WAIt 

CWAI #kkg 

Halt until interrupt 

Clear CCR bits ([CCR] <- [CCR]-kk), 
save entire state and wait for interrupt 

SYNChronize 

SYNC 

Stop until interrupt occurs, THEN: 
continue if masked out, ELSE 
go to interrupt service routine 

Trap 

Software Interrupt 

Software Interrupt 2 
Software Interrupt 3 

SWI 

SWI2 

SWI 3 

Software-initiated interrupt-like sequence 

Save entire state and vector via FFFA:Bh 
and mask out 1 and F Hardware interrupts 

As above but vector FFF4:5h and no masking 
As above but vector FFF2:3/i and no masking 


(a) Relating to the 6809. 


ReTurn 


Switch back to background 

from Exception 1 

RTE 

Pulls context back from Supervisor stack 

Synchronize 


Halt until interrupt 

STOP 1 

STOP #kk i6 

[SR] <- #kk and wait for interrupt 

Trap 


Software-initiated interrupt-like sequence 

CHecK Bounds 

CHK <ea>,Dn 

IF 0 > Dn-W > ea THEN exception via vector 6 

ILLEGAL Instruction 

ILFEGAL 

Exception via vector 4 

TRAP 

TRAP #kk 4 

Sixteen software interrupts via vector 32 + #kk 

TRAP on overflow 

TRAPV 

IF V = 1 THEN exception via vector 7 


(b) Relating to the 68000. 
Note 1: Privileged instructions. 


interrupt. This is true of both hardware and software-initiated responses. In the 
6809 processor the state returned by Return from Interrupt (RTI) depends 
on the setting of the E flag, either all registers if E is zero otherwise only the PC 
and CCR. The equivalent 68000 ReTurn from Exception (RTE) always returns the 
PC and SR only. The same is true for the 8086 MPU family, where the instruction 
is Interrupt RETurn (IRET). 

Most MPUs have at least one instruction which halts the processor until an 
interrupt (or reset) occurs. From Table 16.31 a). we see that the 6809 proces¬ 
sor has two related instructions categorized as such. Clear and WAIt allows 
the programmer to clear the F or I mask if desired prior to stopping. Thus 
CWAI #10111111b clears F and stops the processor after saving the entire ma¬ 
chine state in the System stack (E set). If at some time in the future a NMl or FIRQ 
request is sent, the MPU will immediately go to the appropriate service routine. 
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An IRQ will have no effect in this example, as it is masked out. Notice that unusu¬ 
ally a FIRQ will enter its service routine with the entire machine state (context) 
saved. 

The SYNCHRONIZE instruction is similar, although any CCR flags will have to 
be set by a preceding instruction. However, this time if the interrupt occurs but is 
masked out, then the processor will simply move on to the following instruction. 
If the interrupt is not masked out, and lasts for three clock cycles or more, then 
it will be answered in the normal way. Tri-state buses go high impedance during 
SYNC, allowing an external device to access memory directly IfTOl . 

The 68000's STOP instruction is comparable with the 6809's CWAI, but the im¬ 
mediate word operand is the new state of the Status register, rather than being 
ANDed with it. For example, STOP #001000 011 00000000b will halt the proces¬ 
sor until an interrupt of level greater than 3 occurs. The MPU then responds in the 
normal way. STOP is privileged and thus can only be used in the Supervisor state. 
The machine context is not switched prior to the request. The equivalent HaLT 
(HLT) for the 8086 family does not carry an immediate operand, but otherwise 
operates in the same manner. 

The 6809 MPU has three instructions which explicitly initiate Software inter¬ 
rupt operations. SWI causes the entire state to be saved, sets the I and F masks 
to lock out all but NMI interrupts, and then vectors to the start of its service 
routine via FFFA:B/i. Instructions SWI2 and SWI3 are si mi lar but using vectors 
FFF2:3h and FFF4:5h respectively to hold their start address, and not locking out 
the Hardware interrupts. 

The 68000 MPU has 17 Software interrupts, known as TRAPS. Sixteen of these, 
TRAP #0 to TRAP #15 are unconditional and TRAPV is only implemented if the 
overflow flag is set at execution time. Looking at Fig. l6.51 c). we see that TRAP #0 
vectors via location 000080 - 3 h up to TRAP #15 at 0000BC - F h, Exception vec¬ 
tors 32 to 47. TRAPV has its service address located at 00001C-00001 F h. Like 
all other Exceptions, Traps execute in the Supervisor state. 

Although what a Software interrupt/Trap does is clear enough, the reason for 
using one is not entirely evident. Consider an environment where an applications 
program is being written for a specific computer system. This system will have 
various means of communicating to the world, using typically a keyboard, VDU, 
serial and parallel ports, interrupts and various disk drives. Knowing the charac¬ 
teristics of all these input/output (I/O) devices, the programmer can write a suite 
of subroutines known as device handlers. Once this has been done, data can 
be transferred by calling up the appropriate handler. However, a change of en¬ 
vironment to a different computer will likely require a complete rewrite of these 
handlers. 

This approach is frequently adopted by the designers of embedded micropro¬ 
cessor systems, where the hardware infrastructure is usually highly individual¬ 
istic. Some standardization is possible for mass-produced computing machines, 
such as engineering workstations and personal computers. These normally come 
with an operating system, which can be thought of as a shell around the applica¬ 
tions software shielding the programmer from the hardware. Typical operating 
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systems are UNIX imn and MSDOS HZ). These systems are mainly disk-based 
loaded into RAM, but work in tandem with a Basic Input Output System (BIOS), 
usually located in ROM. The applications programmer can then call up the ap¬ 
propriate subroutine in the BIOS, to communicate with a peripheral. The BIOS 
ROM will vary with different machines, but in such a way as to hide the hardware 
details from the operating system. The use of an operating system leads to the 
concept of system-independent (portable) software. 

Using a Trap call to communicate, rather than a subroutine, has the advantage 
that the address of the procedure need not be explicitly known, as the vector table 
will be in the BIOS. Hiding explicit details of the BIOS is important for portabil¬ 
ity. Thus, as an example, I NT #2 5 in a MSDOS environment fl2l will enable a 
Read from a magnetic disk (INT is the 8086 family mnemonic for TRAP). Param¬ 
eters such as track, sector and drive are placed in registers prior to the Trap. In 
68000-family based systems, the operating system normally resides in the Super¬ 
visor state, completely separated from the application program in the User state 
memory space. 

The 68000 MPU has two additional explicit software interrupt instructions. 
The instruction ILLEGAL (op-code 4AFCh) causes a transfer via vector address 
000010 - 1 3 h and the CHecK register (CHK) instruction vectors via 00001 8 - Bh 
if the lower word of the designated Data register is below zero or above the stated 
li mi t. 

There are also a number of implicit traps, triggered by some internal event. 
These are: 

Address Error, OOOOOC-OFh 

Entered when a word or long-word access to an odd address is attempted. 

Illegal Instruction, 00001 0-13 h 

Entered if an illegal op-code is encountered, but see line A and line F Exceptions 
below. 

Divide by Zero, 00001 4 - 1 7h 

Entered when the divisor for DIVU/DIVS is zero. 

Privilege Violation, 000020-23 h 

Entered when there is an attempt to execute a privileged instruction (e.g. STOP) 
while in the User state. 

Trace, 000024-2 7h 

Entered after each instruction if the T flag is set in the Status register. Used 
during debugging to monitor the state of the processor if the appropriate Trace 
Service routine is in situ [13l . 

Line A Op-Code, 000028 - 2Bh 

Entered when the upper 4 bits of the op-code are 1 01 0b. These op-codes are 
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unused, but this facility provides the means for emulating unimplemented in¬ 
structions in software. 

Line F Op-Code, 00002C-2F/1 

Entered when the upper 4 bits of the op-code are 1111 b. Used as above (the 
68020 MPU uses these codes for co-processor instructions, and therefore service 
routines are often used to simulate these missing instructions in software). All 
other unimplemented instructions vector via the Illegal instruction vector address 
above. 
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PART II 



The only reality as seen from a central processing unit, be it mainframe, mini or 
microprocessor, is in the patterns of binary states in memory. This is generally 
far removed from the human description of the task which is to be controlled by 
the processor hardware. In going from the problem specification to executable 
binary installed in memory involves many steps, both conceptual and in software. 
Many translation processes must occur on the way (see Fig. rm> . Furthermore, 
testing, debugging and commissioning the system require additional skills and 
aids. 

In Part 2 we look at these steps in some detail, how they interact and their 
limitations. In particular we will investigate the use of the high-level language C as 
a buffer between the problem-oriented human thought process and the machine- 
oriented assembly-level languages. Many of the concepts introduced here apply 
to other high-level languages, such as Pascal and Forth, but C is a small language 
which is widely available, especially in a cross form, popular, flexible and can run 
on inexpensive development systems. I can do no better than quote from the 
originators of the language^ 

C is a general-purpose programming language featuring economy of ex¬ 
pression, modern control flow and data structure capabilities, and a rich 
set of operators and data types. 

C is not a very high-level' language nor a big one and is not special¬ 
ized to any particular area of application. Its generality and an absence 
of restrictions make it more convenient and effective for many tasks than 
supposedly more powerful languages. C has been used for a wide variety 
of programs, including the UNIX operating system, the C compiler itself, 
and essentially all UNIX applications software. The language is sufficiently 
expressive and efficient to have completely displaced assembly language 
programming on UNIX. 


1 Ritchie, D.M. ef air, The C Programming Language, The Bell System Technical Journal, 57, no. 6, 
part 2, July-August 1978, pp. 1991 - 2019. 




CHAPTER 7 


Source to Executable Code 


Consider the fragment of code below. To a 68000-family MPU this makes perfect 
sense. Indeed a series of binary bits, typically represented by nominal 0 V and 5 V 
potentials stored in memory, is the only code that a MPU or any other type of 
computer, can understand. To the software engineer, interpreting programs in 
this pure machine code is virtually impossible. Writing code in this form is 
torturous, involving at the very least working out each op-code by hand, together 
with bits representing source, destination and any applicable data; evaluating 
relative offsets; and keeping tally of where data is stored. 

0001000000111000 0001001000110100 

0101110000000000 

0001111000000000 0001001000110101 

Even with a program written in such a form, some means must be found of 
putting or loading the code to its final place in memory. Very early computers 
did not use electronic memory at all, the code being configured by wire links. 
Using switches to set up each memory address and its corresponding data, in 
effect a kind of direct memory access, was still used up to the 1960s to enter a 
short startup program. This program was known as a bootstrap, as once in and 
executed, a paper tape reader could be controlled. Programs could then be read 
in from this source, that is the computer was able to pick itself up by its own 
bootstraps. A modern version of this is the resident BIOS in a PC, which allows 
the MPU to read in the operating system from magnetic disk after switch on, 
hence the term to boot up'. 

Using the computer to aid in translating code from more user-friendly (human) 
forms to machine code and loading this into memory began in the late 1940s. At 
the very least it permitted the use of higher order number bases such as octal 
and hexadecimal. Using the latter, our code fragment becomes: 

1038 1234 

5C00 

1E00 1235 

A hexadecimal loader will translate this to binary and put the code in designated 
addresses. Hexadecimal coding has little to commend it, except that the number 
of keystrokes is reduced (but there are more keys!) and it is slightly easier to 
spot certain types of errors. Nevertheless, this technique was extensively used in 
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the early 1970s for microprocessor software generation and is often still used in 
education as a first introduction to progra mmi n g simple MPUs. 

At the very least a symbolic translator or assembler is required for serious 
programming. This allows the use of mnemonics for the instructions and internal 
registers with names for constants, variables and addresses. We now have: 


.DEFINE 
MOVE.B 
ADDQ.B 
MOVE.B 
.ORC 

NUM1: .BYTE 
NUM2: .BYTE 


CONSTANT = 6 
NUM1,DO 
#C0NSTANT,DO 
DO,NUM2 
1234h 
[ 1 ] 

[ 1 ] 


Get the number NUM1 
Add the constant to it 
is now the number NUM2 
This is the data area 
NUM1 lives at 1234h 
and NUM2 at 1235h 


Giving names to addresses and constants is especially valuable for long pro¬ 
grams. Together with the use of comments, this makes code written in assembly 
level easier to maintain. Furthermore, programs can be written as separate mod¬ 
ules with symbols defined in only one module and a linker program used to 
put them together with their actual values. This assembly of modules into one 
program gave the name assembly-level to this type of language |T|. Of course 
assemblers/linkers and their ancillary programs are rather more complex than 
simple hexadecimal loaders. Thus they demand more of the computer running 
them, especially in the area of memory and backup store. Because of this, their 
use in small MPU-based projects was limited until the early 1980s, when power¬ 
ful personal computers (made possible by MPUs) appeared. Prior to this, either 
mainframe and minicomputers or target-specific microprocessor development 
systems (MDSs) were required. Any of these solutions were expensive. 

Assembly-level language is machine-oriented, in that there is generally a one- 
to-one correspondence to the machine instructions. As such, code written at this 
level bears little relationship to the problem being implemented. The use of a 
high-level language permits a description of the problem in an algorithmically- 
oriented language. In C, our code fragment becomes: 

#define CONSTANT = 6 

unsigned char NUM1,NUM2; /* Define NUM1 and NUM2 as unsigned bytes */ 

{NUM1 = NUM2 + CONSTANT;} /* The process */ 

Now we no longer need to keep track of exactly where NUM1 and NUM2 have to 
be stored. Also we have a large repertoire of mathematical and string functions, 
which do not have a one-to-one machine level counterpart. Notice that our pro¬ 
gram did not indicate which processor's machine code would eventually be pro¬ 
duced, the target might well be a Z80 rather than a 68000 (see Table [ITT. 151 . 

Of course there are problems in using high-level languages, especially when 
the target is an embedded MPU-based system. In general the further away the 
level is from the machine code, the more isolated the programmer is from the 
raw hardware. A compiler also demands much more of its supporting computer, 
and for this reason only recently became popular as a tool in this type of design. 
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Figure 7.1 Onion skin view of the steps leading to an executable program. 

Many high-level languages compile their syntax into assembler-level source 
code which is then translated and linked in the same manner as ' hand written' 
assembly code. Thus in this chapter we will be looking at assemblers, linkers, 
loaders and their associated programs as well as compilation and related pro¬ 
cesses. 


7.1 The Assembly Process 

We have used assemblers at some length in Part 1 of this text, to present a more 
palatable interface to the reader of the (binary) software aspects of two micropro¬ 
cessors. Without going into any detail, we have seen that a Symbolic Assembler 
program (or assembler for short) allows us to use predefined symbols for the 
instructions and various processor registers, and to define names for constants, 
variables and memory locations. They take the drudgery out of calculating rel¬ 
ative offsets and converting number bases. Comments, which are ignored at 
translation time, make maintenance easier than raw code. The use of a conve- 
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nient editor allows alterations to be easily made to the source code, which can 
then be quickly retranslated with the updated symbolic and offset values [QQ 121 ■ 

In faithfully reflecting the underlying structure of the hardware, assembler 
code can produce the smallest and quickest code of any of the symbolic lan¬ 
guages. Even though it is furthest away from the problem algorithm, these advan¬ 
tages frequently mean that assembly-level routines are linked in with high-level 
code, or even used entirely to implement problems, especially when real-time 
operation is required. 

Assemblers are one of a class of translator programs and are available from 
a wide range of originators for most target processors. Although some attempt 
has been made to standardize syntax CDS), normally each package has its own 
rules. Generally the MPU manufacturer's recommended mnemonics are adhered 
to reasonably closely. Directives, which are pseudo operators used to pass infor¬ 
mation to the assembler program, do differ considerably. Details of the layout 
and syntax for the assemblers used in Part 1 are given in Section 2.3 and will 
not be repeated here. Differences in other assemblers used later in this text are 
pointed out where they occur. 

No matter which language is being used, the programmer must prepare the 
source form of the code in the appropriate format and syntax. This preparation 
involves the use of an editor program or word processor. The actual one used 
is irrelevant, provided that the text is stored in a form which can be read by 
the translator, usually plain ASCII. Most operating systems come with a basic 
editor, for example MSDOS's EDLIN and UNIX's ED. More sophisticated packages, 
such as WordPerfect, are usually favored for larger projects. Table 17.11 shows 
a slightly modified source form of the sum-of-integers program first presented 
in Table l4~T0l (actually entered using EDLIN). This document, which is normally 
stored on magnetic disk, is the file presented to the assembler for translation. 
Conventionally the file name is postfixed . S, . SRC or .ASM for assembly source, 
thus the file printed in Table [7711 was called 1 i st7_l. s. 

Assemblers can be broadly classified as absolute or relocatable, according to 
the type of code they produce. The former normally generates a file with the 
machine code and its absolute location ready to be loaded into memory. This 
machine code file is a fi ni shed entity, to which no further alterations need be 
or should be made before loading. The output of a relocatable assembler is not 
yet complete, as it usually does not contain information regarding the eventual 
location of the machine code in memory. Furthermore, symbols may be used in 
the source code which are not defined at this juncture and which are assumed to 
be in modules coming from elsewhere. It will be the job of a Linker program to 
satisfy these unrequited references and to define code addresses. 

Absolute assemblers tend to be simpler to use, as the path between source and 
machine code is more direct, as can be seen in Fig. l7.21 a). Despite their simplicity 
they are rarely used in major projects due to their lack of flexibility. 

As a demonstration, consider the source code listed in Table [7711 This is vir¬ 
tually identical to the source of Table l4T0l but with the directive .ORG replacing 
. PSECT. As this source is to be processed by an absolute assembler, the pro- 
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Table 7.1 Source code for the absolute assembler. 
.processor m68008 


* FUNCTION 

* ENTRY 

* EXIT 

Sums all unsigned word numbers up to n 
n is passed in Data register DO.W 

Sum is returned in Data register Dl.L 

(max 65,535) * 




. defi ne 

L0NG_MASK = OOOOFFFFh 

Used to promote word to long 


. org 

0400h 

Program starts at 0400h 

; for 

(sum=0;n>=0; 

n—) { 


SUM_0F 

_INT: and.l 

#LONG_MASK,dO 

n promoted to long 


clr.l 

dl 

Sum initialized to 00000000 

SLOOP: 

add. 1 

dO, dl 

sum = sum + n 


dbf 

dO,SLOOP 

n--, n>-l? IF yes THEN repeat 

SEXIT: 

rts 




. end 




grammer must specify the start address or origin (ORG) of each section of code 
or data. The . ORG directive may be used as many times as required to locate the 
various sectors, thus if necessary each subroutine may be located at a specific 
start address. 

In translating this source code input, the absolute assembler produces four 
kinds of output. Should there be a problem with the syntax of the source, an 
error file will be produced, giving the line in which it occurred and usually a 
short description. Sometimes a syntax error in one line can lead to problems in 
several other places. Table [7721 is an example of such a file, it was generated by 
replacing the instruction AND in line 11 of our source by the illegal mnemonic 
ANP and the referenced label SLOOP in line 13 by LOOP, that is DBF DO, LOOP. The 
source file is referred to as a: 1 i st7_2 . s. 

If all goes well, zero errors will be produced. This does not of course guarantee 
that the program will work, only that there are no syntax errors! In this situation 
a listing file will be generated, as illustrated in Table [731 This shows the original 
source code together with addresses and the translated code. Other information 
may be provided as well. In this case a cross-reference table shows where names, 
other than reserved mnemonics and directives, are first defined and where they 


X68030 (1): 


Table 7.2 A typical error file. 


a:list7_2.s 11 
a:list7_2.s 14 
a:list7_2.s 11 
a:list7_2.s 11 


unknown op-code anp 
LOOP not defined in file or include 
anp not defined in file or include 
system error <> # 


a:list7_2.s: 4 errors detected 
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Table 7.3 Listing file produced from the source code in Table \7.1\ 
.processor m68008 


FUNCTION : Sums all unsigned word numbers up to n (max 65,535) 
ENTRY : n is passed in Data register DO.W 
EXIT : Sum is returned in Data register Dl.L 


6 

7 

8 

.define LONG. 

.MASK 

= OOOOFFFFh 

; Used to promote word to long 

9 



org 

0400h 

; Program starts at 0400h 

10 

; for (sum=0;n>= 

'-r J 

1 

c 

o 




11 

000400 0280 SUM_ 

_0F_INT: 

and. 1 

#L0NG_MASK,d0 

; n promoted to long 


0000FFFF 





12 

000406 4281 


clr.l 

dl 

; Sum initialized to 00000000 

13 

000408 D280 

SLOOP: 

add. i 

dO, dl 

; sum = sum + n 

14 

00040A 51C8FFFC 


dbf 

dO,SLOOP 

; n--, n>-l? IF yes THEN repeat 

15 

00040E 4E75 

SEXIT: 

rts 



16 



end 




SYMBOL DEFIN 

REFERENCES 




LONG MASK 

8 

11 




SEXIT 15 






SLOOP 13 

14 




SUM_0F_INT 11 






dO- 

11 

13 

14 



dl- 

12 

13 




m68008 - 

1 





are referred to. This can be useful when maintaining large programs. Listing files 
of this nature are for documentation only and have no executable function. 

Symbol files list all symbols which occur in the program, giving name, location 
and sometimes other information. In Table [73] three labels are implicitly identi¬ 
fied, SUM_OF_INT is located at 0400h (the Ox prefix is the hexadecimal indicator 
used in C), SLOOP at 0408h, and SEXIT at 040Eh. The suffix t indicates text (i.e. 
program section). The label L0NG_MASK is explicitly valued and is suffixed a for 
absolute. See Table [TJlla) for a more complex example. 


Table 7.4 Symbol file produced from the absolute source of Table \7. / I 
OxOOOOffffa LONG_MASK 
0x0000040et SEXIT 
0x00000408t SLOOP 
0x00000400t SUM_OF_INT 


Symbol files are commonly used by simulator (see Section 15.2) and in-circuit 
emulator software (see Section 15.3) to replace addresses by their symbolic equiv¬ 
alents, to aid in the debug process. They are also useful as a documentation aid. 
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The most important output from an absolute assembler is the machine-code 
file. This is absolute object code, giving addresses and their contents, ready to 
be loaded into memory and run. In the microprocessor world there are several 
formats of machine code files which have been adopted as de facto standards. 
Although these have been developed by specific manufacturers, notably Intel, 
Motorola, Texas Instruments and Tektronix, in the main they can be used in¬ 
terchangeably by any processor. The type to be generated by an assembler can 
often be specified, and of course must be compatible with a format that can be 
accepted by the loader program. 

Table [73] shows machine code files (often called hex files) produced by our 
example to several formats. The most common of these is the Intel hexadecimal 
object format, originally designed for the 8080 MPU. Each code record line com¬ 
prises an initial colon marker followed by the number of code bytes. The record 
is terminated by a checksum byte, defined as the 2's complement of the modulo- 
256 (8-bit) sum of all the preceding bytes (two hexadecimal digits). As a check, 
the loader program sums all bytes plus the checksum for each downloaded line 
and accepts the accuracy of the data if the result is zero. There is a ||| chance 
that a corrupted record will not pass this trial GO- The last line should have a 
record type 01. 

Expanding the (single) code record of Table [731 a) gives: 


Start of line 

Number of code bytes (16) 
The address of the first byte 
Record type (code) 

Code 

Checksum (2's complement) 


10 


0400 

00 

02800000FFFF4281D28051C8FFFC4E75 

80 


Originally developed for the 6800 MPU, the Motorola S1/S9 object format is 
similar, with a starting marker of S followed by 1 for a code record and 9 for a 
termination line. This is succeeded by a count byte, which indicates the number 
of bytes trailing the SI or S9 field (including itself), a 4-byte address field and 
then the code bytes. The checksum field is the l's complement of the modulo-256 
sum of all bytes following SI or S9. The loader should sum each line including 
the checksum to FF h if the line has been correctly received. Using this format, 
we have from Table [73t b): 

SI Start of code line (S field) 

13 Number of bytes after S field (19) 

0400 The address of the first byte 

02800000FFFF4281D28051C8FFFC4E75 Code 

7C Checksum (l's complement) 


Neither of these object formats can handle addresses of more than 2-byte size. 
The Motorola S2/S8 format, developed for the 68000 MPU, is an extension to the 
S1/S9 format, but with a 3-byte address field. The S3/S7 format is used for 32-bit 
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Table 7.5 Some common absolute object file formats. 

:10 0400 00 02800000FFFF4281D24051C8FFFC4E75 80 
:00 0000 01 FF 

(a) Intel format. 

51 13 0400 02800000FFFF4281D24051C8FFFC4E75 7C 
S9 03 0000 FC 

(b) Motorola SI/S9 format. 

52 14 0FC400 02800000FFFF4281D24051C8FFFC4E75 AC 
S8 04 0FC400 28 

(c) Motorola S2/S8 format. 

:02 0000 02 FC40 CO 

:10 0000 00 02800000FFFF4281D28051C8FFFC4E75 84 
:00 C400 01 3B 

(d) Extended Intel format. 

processors, which require 4-byte addresses. Table |7.5l c) shows the hex file for our 
example but originated at 0FC400/1. The extended Intel hexadecimal equivalent 
is rather more complex as it was designed to cope with the segmented address 
space of the 80x86 family. This uses an extended address record (type 02) if the 
load address is over FFFFh. The data field here holds a 4-digit address which is 
shifted left four times by the loader (giving here FOOOOh) before being added to 
a subsequent 01 type data records' start addresses (here C400h) to give a 5-digit 
load address (i.e. 0FC400h). 

The actual mechanism of the translation process used by the assembler is 
of little importance to us here. Most assemblers are described as 2-pass, as 
historically all but the simplest read the source code, which was frequently on 
paper tape, twice through from beginning to end. During the first pass a loca¬ 
tion counter keeps track of where each instruction is to be placed in memory. 
In an absolute assembler, this will be set by any .0RG directive (0400h in Ta¬ 
ble [ZD- As each operation mnemonic is encountered, the location counter is in¬ 
cremented by the appropriate number; thus AND. L #L0NG_MASK,DO causes the 
location counter to advance by six. 

As labels are encountered, their name and the state of the location counter are 
stored in the symbol table, which is built up during the first pass. Labels which 
are explicitly defined, such as L0NG_MASK, are of course added to the symbol table 
without a translation being necessary. 

It is necessary to build up a symbol table in the first pass to cope with forward 
references; thus an instruction BRA NEXT, where NEXT is further on down the 
source file, cannot be fully translated until NEXT has been encountered and given 
a value. Some assemblers may save any translated machine code to speed up the 
second pass. 

During the second pass, the translation is repeated, but this time any refer¬ 
ences to symbolic names are replaced by the values extracted from the symbol 
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Table 7.6 A simple macro creating the modulus of the target operand. 
3 ; Define macro 


4 


. macro 

LABSOLUTE 

5 


tst.l 

?1 

; Is number in ?1 positive 

6 


bpl 

1$ 

; IF so then no action to be taken 

7 


neg. 1 

?1 

; ELSE negate it 

8 

Q 

1$ 

: .endm 


; Continue 

10 

11 

12 

; Now this macro 
; followed by an 

can be evoked at any 
operand 

time by using its name 

13 

14 

; This fragment 

converts [DO. 

L] to an 

absolute value 

15 

000400 4A80 

6 A0 2 

4480 

LABSOLUTE 

dO 

tst.l dO 
bpl 1$ 
neg.l dO 


16 1$: 


17 ; This fragment converts 20 long words from ElOOh up to absolute form 

18 ; 

19 000406 303C0013 

20 00040A 307CE100 

21 00040E 4A98 

6 A0 2 
4498 

22 000414 51C8FFF8 

23 ; 

24 ; 


move.w #19,dO 

move.w #0E100h,a0 ;- 

LOOP: LABSOLUTE (a0)+ tst.l (a0)+ ~ 

bpl 1$ 

neg.l (a0)+ ~ 

1$: dbf dO,LOOP ;- 


table. With the translation complete, listing, symbol and object files are created 
in the appropriate format. 

In general, assemblers bear a one-to-one relationship to their translated ma¬ 
chine code. Macroassemblers represent a useful upward extension, by allowing 
the programmer to define a group of assembly-language instructions as a named 
macro 0. This macro can be used repetitively anywhere in the program by sim¬ 
ply naming it, followed by a list of operands. The assembler expands this source 
line to its fundamental components whenever that name is encountered. The 
programmer can thus emulate more powerful instructions that are not in the 
MPU's repertoire. As an example, consider the operation where a long-word is to 
be converted to its modulus (positive equivalent). This can be done by testing 
for negative and if true negating the target. This sequence is defined in the body 
of a macro in Table [71)1 between the directives .MACRO and . ENDM. The macro 
name is LABSOLUTE (long absolute value) and takes one operand, a Data register 
or address mode. This is indicated in the body of the macro by the dummy ?1 
(first operand; this assembler can take up to nine). The numeric label 1$ used 
in line 8 has the property that its lifetime only extends to the end of the macro. 
This is necessary, as macro labels will appear in each expansion; and will thus be 
defined several times, see Table fA6l lines f5/f6 and 21/22. 

The macro is invoked by using its name LABSOLUTE followed by the operand. 
In Table lA6l this is done twice, the first specifying a Data register (LABSOLUTE DO) 
and the second the Address Register Post-increment address mode based on A0 
(LABSOLUTE (A0)+). 
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A logical progression of this ability to create a new and more powerful in¬ 
struction set is the evolution of a high-level assembler 0, or even a high-level 
language. 

The 2-pass principle (and the use of macros) apply equally to relocatable as¬ 
semblers. This time the symbol table cannot be fully resolved, as some symbols 
appear in other modules. This resolution is the job of a linker program, which is 
the subject of the next section. 


7.2 Linking and Loading 

A long program is best implemented by breaking it up into a number of function¬ 
ally distinct modules, which can be developed separately. Each module is likely 
to have to cross-reference (XREF) variables from other modules and possibly with 
a library of standard functions. Full details of these will not be known at the 
time these modules are designed. Thus there will be a need for a task builder 
to bring all these bits together, filling in these external symbols to give a sin¬ 
gle composite executable program. This task builder is called a linkage-editor or 
simply linker |8j 9]. Assemblers which work in tandem with a linker are known 
as relocatable. 

We have already used a relocatable assembler-linker to produce the listings 
of Tables [6711 and [6721 These programs comprised three modules: the display 
module which did the background task of outputting data from an array to an 
oscilloscope; the foreground interrupt service routine which updates the array, 
and the vector table entries. The three modules, Display, Update and Vector, were 
assembled separately and then linked together to give the composite program. 

The most important difference between an absolute and relocatable assembler 
is in the treatment of symbols. Symbols explicitly allocated constant values by 
the programmer, such as COUNTER in line 11 of Table 16.21 b). are absolute and 
require no further attention by the li nk er. 

Symbols defined implicitly by attaching a label to an instruction are relocat¬ 
able, since their value is only known relative to the start of the module, the lo¬ 
cation of which will be determined by the li nk er. The label DLOOP in line 23 of 
Table RTTt a) is relocatable two bytes after the start of the Display module. Even 
more vague are symbols referred to but not defined in a module. Such symbols 
are assumed to be defined in some other module and should be declared so by 
an external declaration, for example the label ARRAY of line 20 in Table [672T b). 

Besides being tagged Absolute, Relocatable or External, symbols have the at¬ 
tributes of being Global (Public) or Local. By default local symbols cannot be ref¬ 
erenced from an outside module, for example DLOOP in Table fOUl a). If a symbol 
is to be globally known then it must be declared as such. In line 16 of Tablc lQUl a). 
ARRAY is declared public by using the . PUBLIC directive. Other assemblers use 
the CLOBL or XDEF directives. The assembler must pass on the symbol names, 
tags and attributes to the linker together with the machine code in its output 
relocatable object file. 
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Machine code is passed to the linker in streams. The RTS assemblers fun¬ 
damentally identify two streams, one for program code and the other for data. 
Programs in Tables l6.1l and l6.2l used the directives .PSECT _TEXT for the former 
and .PSECT _DATA for the latter, where .PSECT stands for Program SECTion. 
Most embedded microprocessor systems will require text (which includes tables 
of constants) in ROM and use RAM for variable data. In certain circumstances the 
RTS assembler linker can handle two additional data sections, _ZPAGE for data 
which will lie in the direct/absolute-short memory areas (zero page) of MPUs such 
as the 6800/9 and 68000 devices and _BSS (Block Symbol Start) frequently used 
for variables which have no initial value (see Section 10.3). 

The Microtec Research Paragon 68K product^ used later in this section, can 
handle up to 16 program sections. This is useful where several non-contiguous 
memory chips are being targeted. For example, initialized variables could be put 
in a specific segment and placed in ROM. Later, at run time, they can be copied 
into RAM, where they can be treated as variables; that is changed at will (see 
Section 10.3). 

Some relocatable products do not permit absolute placement of code using the 
. 0RG directive, and in any case this is considered bad practice. The RTS products 
do permit relocatable ORGs, thus the fragment: 

START: - 


.0RG START + OFFEh 

.WORD ADDR1 

will place the data word ADDR1 0FFE:F/i bytes on from START. If you know that 
the linker will locate START at 0E000/1, then this will actually be at 0EFFE:F/i. 

That part of the machine code referring to labels, e.g. M0VEA. L #ARRAY,A0 in 
line 30 of Table RTH b). which are relocatable or external is not resolvable at this 
time. Thus the assembler must parallel the code streams with information relat¬ 
ing these bytes to their label. Object code also contains headers giving processor 
information, such as the order of address bytes (most or least significant first), 
size of processor words, length of symbols, number of machine-code bytes etc. 
With all this in mind it will be appreciated that relocatable object file formats are 
much more complex than their absolute counterparts of Table I7 a1 As a conse¬ 
quence of this, their structure is very much specific to each product. 

As our example for this section, we will follow through the program defined in 
Table IUT21 but this time using the more sophisticated Microtec Research Paragon 
68K assembler/linker. The instruction mn emonics and address mode represen¬ 
tations follow the standard Motorola conventions, but the directives differ con¬ 
siderably from the RTS mnemonics used up to now. Some key directives are: 

ident <name> Gives the module a name (identity), 

opt <flags> Options, such as CASE for case sensitivity. 

1 Microtec Research Inc., 2350 Mission College Blvd., Santa Clara, CA 95054, USA and Ringway 
House, Bell Road, Daneshill, Basingstoke, Hampshire, RG24 0FB, UK. 
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<namel> equ <value> Equates a symbol with an absolute value 

(similar to . def i ne). 

sect <n> Section number; equivalent to . psect. 

ds.b/.w/.l <n> Define Storage; reserve n bytes, words or 

long words (equivalent to . byte/ .word/. doubl e [n]). 
dc. b/. w/. 1 <n, - - , m> Define Constant; puts list of specified bytes, words 

or long words into section stream (equivalent to 

.byte n,-,m / .word n,-,m / .double n,-,m). 

xdef <name> Publishes symbol as global, i.e. Cross DEFine 

(equivalent to . publ i c). 

xref <name> Identifies symbol as external, i.e. Cross REFerence 

(equivalent to .external). 

The source code using this assembler for the Display module is given in Ta- 
ble l7.7[ a). I have placed data (ARRAY and X_C00RD) in Section 14 and the program 
text in Section 9. These are the sections chosen for data and text by the Microtec 


Table [7T71 Assembling the Display module with the Microtec Research Relocatable assem¬ 
bler (continued next page). 


opt E,CASE 

DISPLAY idnt 


* Background program which scans array of word data (ECG points) 

* Sends out to oscilloscope Y plates in sequence 

* At same time incrementing X plates 

* so that ARRAY[0] is seen at the left of screen 

* and ARRAY[255] at the right of screen 

* ENTRY : None 

* EXIT : Endless loop 


J 

DAC_X: 

equ 

6000h 

8-bit X-axis D/A converter 

DAC_Y: 

equ 

6001h 

12-bit Y-axis D/A converter 

J 

sect 

14 

Section 14 is Data space 


xdef 

ARRAY 

Make the array global 

ARRAY: 

ds . w 

256 

Reserve 256 words for the array 

X_C00RD: 

ds. b 

1 

and a byte for the X co-ordinate 

> 

sect 

9 

Program space 


xdef 

DISPLAY 

Make this program known to the linker 

DISPLAY: 

cl r ,w 

dO 

Get X co-ordinate byte 

DL00P: 

move.b 

X_C00RD,dO 

expanded to word 


1 si ,w 

#1, dO 

Multiply by two to give array index in DO.W 


movea.1 

#ARRAY,aO 

Point AO to ARRAY[0] 


move.w 

0(aO,dO.w),DAC_Y 

Get ARRAY[x] to oscilloscope Y plates 


move.w 

X_C00RD,DAC_X 

Send X co-ordinate to X plates 


addq. b 

#1,X_C00RD 

Go one on in X direction 


bra 

DLOOP 

and show next sample 


end 

DISPLAY 


(a) Source 

code for the Display module. 
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Table 7.7 (continued,). Assembling the Display module with the Microtec Research Relocatable as¬ 
sembler. 

Microtec Research ASM68008 V6.2a Page 1 Wed Jan 04 15:59:41 1989 


Line Address 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 


opt 

DISPLAY 


E,CASE 
i dnt 


Background program which scans array 
Sends out to oscilloscope Y plates in 
At same time incrementing X plates 
so that ARRAY[0] is seen at the left of screen 
and ARRAY[255] at the right of screen 
ENTRY : None 
EXIT : Endless loop 


of word data (ECG points) 
sequence 


13 

00006000 


DAC X: 

equ 

6000h 

8-bit X-axis D/A converter 

14 

00006001 


DAC_Y: 

equ 

6001h 

12-bit Y-axis D/A converter 

-L J 
16 

’ 



sect 

14 

Section 14 is Data space 

17 




xdef 

ARRAY 

Make the array global 

18 

00000000 


ARRAY: 

ds ,w 

256 

Reserve 256 words for the array 

19 

00000200 


X_C00RD 

: ds. b 

1 

and a byte for the X co-ordinate 

4U 

21 

* 



sect 

9 

Program space 

22 




xdef 

DISPLAY 

This program known to the linker 

23 

00000000 

4240 

DISPLAY 

: clr.w 

dO 

Get X co-ordinate byte 

24 

00000002 

1039 

DL00P: 

move.b 

X_C00RD,dO 

expanded to word 



0000 

0200 R 




25 

00000008 

E348 


1 si .w 

#1, dO 

x2 to give array index in D0.W 

26 

0000000A 

207C 


movea.1 

#ARRAY,aO 

Point A0 to ARRAY[0] 



0000 

0000 R 




27 

00000010 

31F0 


move.w 

0(a0,d0.w) ,DAC_Y ; Get ARRAY[x] to Y plates 



0000 

6001 




28 

00000016 

31F9 


move.w 

X_C00RD,DAC_X ; Send X co-ord to X plates 



0000 

0200 R 

6000 



29 

0000001E 

5239 


addq.b 

#1,X_C00RD; Go one on in X direction 



0000 

0200 R 




30 

00000024 

60DC 


bra 

DL00P 

and show next sample 

31 




end 

DISPLAY 





Symbol 

Tabl e 




Label 


Val ue 


ARRAY 

DAC_X 

DAC_Y 

DISPLAY 

DL00P 

X_C00RD 


14:00000000 
00006000 
00006001 
:00000000 
:00000002 
1:00000200 


9 

9 

14 


(b) Resulting listing file before linking. 


research C compiler, which we will use later; for example see Table [T0T9). 

The listing file produced after assembly is shown in Table fTTzl b). Both Data 
and Text sections are shown starting from OOOOOOOO/i; they will be subsequently 
located by the linker. This uncertainty also affects machine code relating to la¬ 
bels. Thus in line 29 the value of X_C00RD is replaced by its offset from the data 










1 82 C FOR THE MICROPROCESSOR ENGINEER 


segment's zero start value, that is 0200b. Notice that all lines with machine code 
which contains values to be relocated later are tagged with R. A symbol table is 
also produced by the Lister utility (as shown at the bottom of the listing), and 
this shows the section number followed by an offset for each relocatable symbol. 
Absolute symbols, such as DAC_X, have an absolute value attached. 

The output from the Update module source is shown in Table [TTHl Here we 
have an external symbol which is tagged with E in line 31 and identified in line 21. 
No value is given for ARRAY in the Symbol table, just External. 

The Vector code, shown in Table [7T9l similarly has external symbols which are 
tagged with E. This has been placed in Section 0, so that it can be linked in as the 
start of the 68000's vector table. 

The linker program, depicted in Fig. 17.2( b), has several tasks to perform: 

1. To concatenate code from the various input modules in the specified order, 
to give one contiguous object module. 

2. To resolve any intersegment and library symbolic references. 

3. To extract code from libraries into the output object module. 

4. To generate the object file together with any symbol, listing and link-time error 
files. 

In our example, incoming object modules contain code located in three sec¬ 
tors; 0, 9 & 14 (two for Tables RvTl and l6?2l _text and _data). The new composite 
sections are built up by concatenating like streams from the input object files 
as they come in. Unless otherwise directed, code from Section n simply begins 
where the last Section n input left off. Thus looking back at Table [6721 text in 
the Display module goes from 0400b-0425b and in the Update module from 
0426b - 0463b. However, the programmer can sometimes override this progres¬ 
sion by specifying a module's start address independently. This is how the Vector 
module's text was forced to run from 0000b - 0068b, as directed in Table r7.101 b). 

Table IZHOl shows the invocation of two linker programs. The top one, L0D68K 
by Microtec Research, is used in our example, whilst the bottom one, LINKX by 
RTS, was used to generate the code in Table [6721 

Taking the latter first, LINKX is followed by a Command line comprising a 
series of flags and file names, by which the programmer directs the action of the 
linker. The action commanded in Table [7~iTR b) is, reading from left to right: 


-tb 0000 Start text bias at zero (default) 

vector.o Scan object program vector. o (Table IU2t c)) 

-tb 0x400 Next text code starts from 400b 

-db OxEOOO and data code from EOOOb 

di spl ay. o Scan object program di spl ay. o 

update. o and then object program update. o 

-o output. XEQCreate a composite object file named output. XEQ 


Note the use of the C language prefix Ox to indicate hexadecimal. The action of 
these commands can clearly be seen by looking at the addresses of the resulting 
code of Table 16.21 
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ARRAY 14:External 
COUNTER 00009000 
INT_FLAG 00009800 
LAST_TIME 14:00000002 
UPDATE 9 :00000000 
UPDATE_I 14:00000000 
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Table 7.9 Module 3 after assembly. 

Microtec Research ASM68008 V6.2a Page 1 Wed Jan 04 16:32:28 1989 


Line Address 
1 

2 VECTOR 


opt E,CAS E 

i dnt 


3 

4 

5 

* Sets up Interrupt and Reset 

* using globally known labels 

vectors at bottom of ROM * 

through the linker * 

7 

8 


sect 

0 

; Use Section 0 for vector table 

9 


xdef 

VECTOR 

; Make this routine known globally 

10 


xref 

UPDATE,DISPLAY ; These will be got through link 

11 


SSP: 



12 00000000 

VECTOR: dc.l 

OFOOOh 

; Init value of the System Stack pointe 



0000 F000 



13 00000004 

PCR: dc.l 

DISPLAY 

; Co to DISPLAY routine on Reset 



0000 0000 E 



14 00000008 

ds. 1 

23 

; Other vectors not used here 

15 00000064 

LEVELl: dc.l 

UPDATE 

; Addr of Level 1 IRQ service routine 



0000 0000 E 



16 


end 

VECTOR 




Symbol Table 



Label 


Val ue 


DISPLAY External 
LEVEL1 0:00000064 
PCR 0:00000004 
SSP 0:00000000 
UPDATE External 
VECTOR 0:00000000 


In realistic cases the linker command sequence is complex and it is better to 
use a Command file, which is automatically read at link time. The Command file 
of Table f7~iT)l a)(i) is that read by Microtec Research's L0D68K linker to combine 
our three object modules. Here Program section 0 (used by the Vector module) 
is commanded to start at OOOOh (sect 0 = 0000) w hi lst Section 9 builds from 
0400h up (program) and Section 14 from OEOOOh up (data). Modules are scanned 
in the order given by the Load commands. The Absol ute command selects which 
code sections go into the final absolute hex file (see Table IzTTl b)): I have omitted 
RAM data (Section 14) from this request. 

The Command line of Table fTHOl aKii) gives the name of the Command file 
(prefixed by @), the name of a special Map listing file and the name of the absolute 
Machine-Code file; the latter two are shown in Table [77111 
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Table 7.10 Linking the three source modules. 


This is the Command file used for the Microtec Research linker 
to combine the three modules previously assembled together 
It is called DISPLAY.CMD 


Section 0 is for the Vector table 
Section 9 is for text 

Section 14 is for unitialized variables in 


RAM 


sect 0 = 0000 
sect 9 = 0400h 
sect 14= OEOOOh 
absolute 0,9 
list d,s,t,x,c 
load vector.obj 
load display.obj 
load update.obj 
end 


Vector table starts at 0000 for the 68000 
Program starts at 0400h (ROM) 

Data starts at EOOOh (RAM) 

Only put Sections 0 and 9 in the hex file 

Options for the Listing file 

Load and scan the Vector file first 

Then the background file 

and finally the Interrupt service file 


i: The Command file 

L0D68K @display.cmd,display.map,display.abs 
ii: Evoking the Microtec linker LOD68K 

(a) Linking using the Microtec products. 

LINKX -tb 0000 vector.o -tb 0x0400 -db OxEOOO display.o update.o -odisplay.xeq 

(b) The equivalent linking process using the RTS products, see Table [6T2j 


While code is being entered from the various input object files, a composite 
symbol table is being built up by the linker. For our example this combined 
symbol table is shown in the Map file produced by L0D68K to give the final location 
(i.e. map) of all code sections and symbols. There are three types of symbols 
entered into the linker. Absolute symbols have been given a fixed value by the 
programmer. These are usually known addresses of external hardware, such as 
the X and Y digital to analog converters of our Update module. These are marked 
as ABSC0NST under SECTION in the map. 

Defined symbols are assigned relative to the beginning of the module they are 
created. Thus DL00P is indicated as Section 9 Offset 00000402 in the Display 
module. 

Symbols referred to but not actually defined in a module are usually assigned 
a value when all the code is in. They must be declared Public where they are 
defined. When known, the value of a ref (referred to) symbol is substituted in the 
code where they are referred to. Public symbols are listed separately in the Map 
file. 

The LOD68K linker does not give an Absolute listing file output, u nl ike the 
LINKX product (via a utility program ABSX). However, Table [6721 is indicative of 
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how it would look. 

If any ref symbols remain unresolved, the linker will scan such library files 
as are indicated in the Command line or file. A library file typically comprises a 


Table r7TTTl Output from the Microtec linker (continued next page). 

Microtec Research Lod68K V6.2a Thu tan 05 10:17:34 1989 

OUTPUT MODULE NAME: display 

OUTPUT MODULE FORMAT: MOTOROLA S2 

SECTION SUMMARY 


SECTION 

ATTRIBUTE 

START 

END 

LENGTH 

ALIGN 

0 

NORMAL DATA 

00000000 

00000067 

00000068 

2 (WORD) 

9 

NORMAL CODE 

00000400 

00000463 

00000064 

2 (WORD) 

14 

NORMAL DATA 

0000E000 

O0O0E205 

00000206 

2 (WORD) 


MODULE SUMMARY 


MODULE 


S ECTION:START S ECTION:END FILE 


VECTOR 

DISPLAY 

UPDATE 


0:00000000 

0 

14:OOOOEOOO 

14 

9:00000400 

9 

14:0000E202 

14 

9:00000426 

9 


00000067 vector.obj 
0000E200 display.obj 
00000425 

0000E205 update.obj 
00000463 


LOCAL SYMBOL TABLE 


SYMBOL 

ATTRIB 

SECTION 

OFFSET 

MODULE 

FUNCTION 

ARRAY 

ASMVAR 

14 

OOOOEOOO 

DISPLAY: 

COUNTER 

ASMVAR 

ABSCONST 

00009000 

UPDATE 


DAC X 

ASMVAR 

ABSCONST 

00006000 

DISPLAY: 

DAC Y 

ASMVAR 

ABSCONST 

00006001 

DISPLAY: 

DISPLAY 

ASMVAR 

9 

00000400 

DISPLAY: 

DLOOP 

ASMVAR 

9 

00000402 

DISPLAY: 

INT FLAG 

ASMVAR 

ABSCONST 

00009800 

UPDATE 


LAST TIME 

ASMVAR 

14 

0000E204 

UPDATE 


LEVEL1 

ASMVAR 

0 

00000064 

VECTOR 


PCR 

ASMVAR 

0 

00000004 

VECTOR 


SSP 

ASMVAR 

0 

00000000 

VECTOR 


UPDATE 

ASMVAR 

9 

00000426 

UPDATE 


UPDATE I 

ASMVAR 

14 

0000E202 

UPDATE 


VECTOR 

ASMVAR 

0 

00000000 

VECTOR 


X_C00RD 

ASMVAR 

14 

0000E200 

DISPLAY: 

PUBLIC SYMBOL TABLE 






SYMBOL 

SECTION 


ADDRESS 

MODULE 


ARRAY 

14 


OOOOEOOO 

DISPLAY 


DISPLAY 

9 


00000400 

DISPLAY 


UPDATE 

9 


00000426 

UPDATE 


VECTOR 

0 


00000000 

VECTOR 



(a) Map file. 
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Table 7.11 (continued! Output from the Microtec linker. 

S00600004844521B 

S 2 OCOOOOOOOOOOFOOOOOOOO 4 OOFF 

S2080000640000042669 

S21400040042401039OO0OE2O0E3482O7COOOOEOOO93 
S21400041031F00000600131F90000E200600052395E 
S20A0004200000E20060DCB3 

S21400042 648 E7C08042 790000980030390000900006 

S214000436320092790000E20433COOOOOE204424033 

S21400044630390000E202E348207COOOOE0003181FB 

S212000456000052790000E2024CDF01034E73F4 

S5030009F3 

S804000426D1 


(b) Flex file. 


series of object-code programs, each headed up with a name and a code length. 
Should an unresolved symbol match such a name, the succeeding code is ex¬ 
tracted and added to the appropriate Program sections already formed by the 
linker. Thus, unlike a normal object-code file, only relevant portions of a library 
file are extracted and used. The linker recognizes a library file from its unique 
header. A typical evocation of a linker using a floating point mathematics library 
might be: 


- Merge filel object 

- with file2 object 

- both of which can use 

routines found in the library 

LINK filel.o file2.o fpoi nt. "I i b 

Libraries are typically provided by compiler manufacturers, covering mathe¬ 
matical, string and input/output functions. Compilers and assemblers usually 
come with a utility program known as a librarian. The programmer uses the li¬ 
brarian to build up his/her own personal libraries or to modify existing ones (see 
also Section 9.4). 

Symbols which remain unresolved after the linking process are indicated as 
errors. Whether they are depends on the format of the output object code. If 
this is in the same relocatable mode as the input, then the resulting file can 
be subsequently linked again with other object files. An absolute format, such 
as shown in Table 17.51 precludes any further processing of this nature. Some 
Linkers, such as the RTS product, naturally produce the former and require a 
utility program, often called a Hexer, to extract an absolute file. 

From Fig. I7.2l we see that the end of the translation chain is the Loader pro¬ 
gram. The purpose of a Loader is to take the object code and place it in memory, 
from where it can be run. The operation of the Loader depends somewhat on 
the relationship between the computer doing the translation (i.e. assembly and 
linkage) and the target system that will actually run the generated code. 








<Assembler> 
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Figure 7.3 Assembly environment. 


Target system 
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In the situation where they are the same, as depicted in Fig. l7.3t a). the Loader 
will frequently be part of the computer's operating system. Such a Loader can 
usually deal with both relocatable and absolute object files produced by a Linker. 
In the former case the operating system decides on the location of the various 
code streams; in the latter the programmer can influence this decision through 
the Linker. In such a resident system, the user program in its object form nor¬ 
mally resides on disk. When the operator decides to run the program, the operat¬ 
ing system first loads and locates the code, then proceeds directly to execution; 
that is load and go. Some configurations, mainly mainframe, combine the linkage 
and loading operations in one Linker-Loader program. 

Although it is possible to interface devices to a computer and use a resident 
configuration, in engineering applications the cross-target arrangement depicted 
in Fig. |7.3[ b) is the more usual. Here the microprocessor-oriented hardware is 
distinct from the computing apparatus doing the code conversion. Indeed it is 
unlikely that they even use the same processor. In this situation the assembler 
is known as a cross-assembler as opposed to a resident assembler. Where the 
user hardware is a dedicated controller with its software in ROM, the Loader must 
be in the target system. This may well be part of the operating software of an 
intelligent EPROM programmer, into which absolute object code is downloaded 
into a RAM buffer for later progra mm ing. The blown EPROM is then moved by 
hand to the target. Alternatively during development the Loader may be in an in- 
circuit emulator interface package or the operating system of a microprocessor 
development system (MDS). In all of these cases it is likely that the Loader will 
act on absolute object code, such as depicted in Table 1731 Absolute Loaders are 
somewhat less complex than their relocatable counterparts, which are used in 
resident configurations. 

The cross environment is necessary because dedicated microcontrollers rarely 
have the facilities necessary to develop their own software. The additional soft¬ 
ware and hardware resources necessary for this purpose cannot be integral to the 
system as they must be easily jettisoned when their use is over. Targets of this 
form, without their own general-purpose operating system, are often referred to 
as naked systems. The use of a general purpose computer with an in-circuit 
emulator can be thought of as supplying these resources to a naked system in a 
form that can be readily disengaged when no longer needed. 


7.3 The High-Level Process 

We have defined a high-level language as a code which is modelled more on 
the algorithm of the problem rather than on the underlying machine which will 
actually solve it. The level of a language can be quantified as a function of the 
' distance' it is removed from its ultimate machine code. The compiler is then the 
system program which translates from one language to another llOl . Strictly this 
includes programs which convert between high-level languages such as Pascal 
to C (PTC) or BASIC to C (BASTOC). In this book, we use the term compiler in 
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its narrower sense, to denote translation to the target processor machine-level 
language. 

In principle an assembler and compiler carry out a similar task, but clearly 
the latter has a much more onerous burden to discharge. There are two parts to 
a compilation process: analysis and synthesis. The analysis part separates the 
source text into the constituent parts of which the language is composed. This 
is akin to the verbs, nouns, adjectives etc. of human language. The structural 
relationship of these elements, called leximes, must then be ascertained. The 
synthesis part generates the desired code from the intermediate representation 
created by the analysis phases. 

All this is easily stated, but the details are of necessity rather complex and of 
little relevance to other than compiler designers; interested readers are directed 
to references 1TT1 T2l . However, it is instructive to expand a little on the process 
of compilation. 

Lexical analysis, or parsing, subdivides the source language into its funda¬ 
mental chunks or tokens, and identifies each token, whether operator, constant, 
variable etc. Thus the two characters >= may be recognized as a relational op¬ 
erator of type Greater or Equal, and this could be coded as REL_0P GE. Taking a 
slightly more meaningful example, consider the expression: 

sum = (n + 1) * n/2; 

which evaluates the sum of all integers up to n. A Lexical analysis would produce 
something like that shown in Table [7T21 where each chunk is parsed into a token 
and an attribute. For instance, the variable sum is an identity (the name of a 
variable is commonly known as its identity) and its attribute is an address or 
pointer into the Symbol table. 


Table 7.12 A possible Lexical analysis of sum = (n+l)*n/2; 

Source Expression Token Attribute Comment 


sum 

id 

Pointer to Symbol table entry for sum 

Identity sum 

= 

assign-op 

— 

Assignment operator 

C 

par-op 

L 

Parenthesis, Left 

n 

id 

Pointer to Symbol table entry for n 

Identity n 

+ 

add-op 

— 

Addition operator 

1 

const 

1 

Constant value 1 

) 

par-op 

R 

Parenthesis, Right 

* 

mul-op 

— 

Multiply operator 

n 

id 

Pointer to Symbol table entry for n 

Identity n 

/ 

div-op 

— 

Divide operator 

2 

const 

2 

Constant value 2 

> 

end-op 


End-of-statement op 


This example is fairly simple to decompose into its constituent elements. As 
a more difficult situation, consider the fragment of C source code: 


n+++m 



THE HIGH-LEVEL PROCESS 


191 


Assign 



Figure 7.4 Syntax tree for sum = (n+1) * n/2; 

where the ++ operator following a variable means Auto-Increment after use, and 
prior to a variable means Auto-Increment before use. A single + in the normal 
way means Add. Are the tokens n++ +m or n-i- ++m? 

Actually, the former is the correct interpretation, as C compilers analyze using 
the 'ma xi mal munch’ strategy IflU . Here the parser moves from left to right, 
biting off the longest possible token; hence ++ first followed by +. Thus the 
expression means add to m the post-incremented value of n. 

A Lexical analysis says nothing about the relationships between the various 
leximes. For this, a subsequent Syntactic analysis must be performed. The inter¬ 
relationship and order of operators, constants and variables for our example is 
shown in the Syntax tree of Fig. 17.41 The expression to the right of the assignment 
operator is evaluated from the most distant parts up: that is, first add n + 1 then 
multiply by n and then divide by 2. The variable sum is finally overwritten by this 
value. This process is governed by the precedence order defined by the language 
(e.g. multiplication has a higher precedence than addition), direction of evaluation 
(e.g. right to left) — see Table l8~4l f or an example of both of these —, parenthesis, 
brackets, loop constructs etc. More elaborate Syntax trees are often called Parse 
trees. 

Parse trees are in turn subjected to Semantic analysis. This gathers type in¬ 
formation relevant to the coming code-generation phase. In particular the type 
and size of variables and constants need to be checked and altered according to 
the rules of the language. For example in C if we have an expression of the form: 
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Z = X + Y; 

where X has been declared an integer (say 32 bits) and Y a short integer (say 
16 bits), then Y must be expanded to 32 bits before the addition is performed. 
Other type conversions include signed and unsigned combinations, floating and 
fixed-point mixes etc. Errors may be reported during this phase as well as all 
previous phases. 

The output of the Semantic analysis is a type of Intermediate code. Although 
Intermediate code is independent of a real machine, it nevertheless reflects the 
type of operation available in the target. The synthesis of real machine code 
involves the determination of storage requirements and addressing algorithms 
for the variables and the expansion of the Intermediate code statements to se¬ 
quences of machine-specific instructions. Intermediate code for our example may 
be something like: 

1. Move integer n in from memory. 

2. Put in multiplier location. 

3. Add one. 

4. Move to the multiplicand location. 

5. Multiply them. 

6. Divide the returned value by two. 

7. Move out to where sum is. 

The actual machine code produced by the Cosmic 6809 C cross-compiler V3.1 
for this example is shown in Table 177131 Notice how closely it mi rrors this pseudo 
code. 

The front end of the compiler covering the analysis phases through to the 
production of Intermediate code is mainly a function of the source language and 
largely independent of the target machine. The back end of the compiler includes 
those portions of the compiler that generate the specific target language. In 
theory the target may be changed by replacing only the back end components. 

The code produced by the compiler illustrated in Table 17.131 is in normal 
assembly-level format. To complete the production of machine code, it can be 
passed on through the chain of Fig. 17.21 that is the assembly process. Other mod¬ 
ules from high-level language programs, assembly-level programs and libraries 
can be linked to generate the final executable code. 

The compilation process used to produce the code in Table 17.131 is shown in 
Fig. 17.51 The Whitesmiths Group series of compilers use separate programs to 
implement the various processes discussed above El. These are: 

PP 

The preprocessor implements the Lexical analysis. Also expands out #i ncl ude 
file and #def i ne substitutions and macros. 

Pi 

Performs Syntax and Semantic analysis to produce intermediate code. 
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p2nn 

Generates source code for machine nn's assembler. For example, p209 synthe¬ 
sizes source code for a 6809 MPU, p280 for 8080/8085 processors, 
optnn 

This is an optional peephole optimizer which eli mi nates redundant instruc¬ 
tions generated by p2nn. 

Not shown in the diagram is the Listing utility which produces interleaved 
listings of assembly-level statements as comments. Also the optional front-end 
Pascal to C (PTC) translator used when Pascal source is desired. 

Splitting up the compiler into separate programs has the advantages of flexi¬ 
bility and requires less in the way of memory capacity of the computer. A single 
composite compiler program, as used in most commercial products, is much 
faster but does demand more of the translating engine. In both situations, var¬ 
ious options are selected by following the program(s) with flags in the form of 
command lines or files, much as depicted in Table fTTol 

The top right process box in Fig. l7.5l is labelled Peephole optimizer. Depending 
on the sophistication of the compiler translation, a variable percentage of the 
machine code produced is either inefficient or redundant. For example, as each 
high-level code statement is processed separately, a subsequent translation may 
not be aware that a variable that it requires is already down in a Data register. 


Table 7.13 6809 target code for sum = (n+1) * n/2; 
; Compilateur C pour MC6809 (COSMIC-France) 

.processor m6809 


.psect _text 

L3_n: .byte 0 

L31_sum: .byte 0,0 

; unsigned int sum_of_n() 
; { 

.psect _text 


_sum_of_n: 

; static unsigned char n; 

; static unsigned int sum; 
; sum = (n+l)*n/2; 


ldb 

L3_n 

(Move integer n in from memory 

) 

cl ra 


(Expand to integer in (D) 

) 

tfr 

d, x 

(Move to multiplier location (X) 

) 

addd 

#1 

(Add one. In multiplicand location (D) 

) 

jsr 

c_imulx 

(Multiply them 

) 

1 dx 

#2 

(Prepare to divide returned value by 2 

) 

jsr 

c_idivx 



std 

L31_sum 

(Move out to where sum is stored 

) 

return(sum); 




rts 


(Returned in Accumulator D 

) 


} 

.public _sum_of_n 
.external c_idivx 
.external c_imulx 
. end 
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Accessing an array element in a loop may require the use of a complex address 
mode to calculate its location, (see line 27 of Table [7.7l as an example). However, 
if the array elements are going to be accessed sequentially, it will be faster to 
load an Address register prior to entering the loop with the address of the array 
element to be accessed on the first loop iteration. Thereafter, indirect addressing 
with automatic increment/decrement can be used. This is known as strength 
reduction. There are of course obvious faux pas such as using multiplication for 
the function X * 1! 

Peephole optimization is a method of improving the quality of the machine 
code by moving a small window over the target program looking for redundancies. 
This window is typically 30- 100 code lines. In general, the window-sized scan 
will be repeated until no further improvements can be made. 

There are many different types of techniques for optimization transforma¬ 
tions. Reference n~n gives an overview of this area. For example, the Microtec 


Table f7~T4l Passing a simple program through the compiler of Fig. \7 ,5\ (continued next 
page). 

unsigned int suin_of_n() 

{ 

static unsigned int n; 
static unsigned int sum; 
sum=0; 
whiie(n>0) 

{ 

sum=sum+n; 
n=n-l; 

} 

return(sum); 

} 

(a) C source. 


; Compilateur C pour MC6809 (COSMIC-France) 



.processor 

m6809 




.psect 

data 



L3_n: 

. byte 

0,0 



L31_sum: 

. byte 

0,0 




.psect 

_text 



_sum_of_n: 

ci ra 


(sum=0000 

) 


cirb 





std 

L31 sum 



LI: 

1 dx 

L3_n 




cmpx 

#0 

(n>0? 

) 


jbeq 

Lll 

(Exit IF true 

) 


ldd 

L31 sum 

(Get sum 

) 


addd 

L3_n 

(Add n to it 

) 


std 

L31 sum 

(= new sum 

) 


ldd 

L3_n 

(Get n 

) 


addd 

#-l 

(Subtract 1 

) 


std 

L3_n 

(n=n-l 

) 


jbr 

LI 

(Repeat while 

) 

Lll: 

ldd 

L31_sum 

(Return sum 

) 


rts 





.public 

_sum_of_n 




end 


(b) Resulting assembly source code. 
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Table [7~T4l Passing a simple program through the compiler of Fig. |7.5 1 (continued next 
page). 


; Compilateur C pour MC6809 (COSMIC-France) 


.processor m6809 



.psect 

data 


L3_n: 

. byte 

0,0 


L31_sum: 

. byte 

0,0 


; unsigned int sum 

_of_n() ; 

{ 


.psect 

_text 


; static 

unsigned 

int n; ; 

static unsigned i 

_sum_of_n 

cl ra 




cl rb 




std 

L31_sum 


; line 6 

; while(n>0) 


LI: 

ldx 

L3_n 


.***** cm p x #o (Removed by optimizer) 


]beq 

Lll 



{ 

| 

sum=sum+n 


ldd 

L31 sum 



addd 

L3_n 



std 

L31_sum 



n=n-l; 




ldd 

L3_n 



addd 

#-l 



std 

\ 

L3_n 



s 

jbr 

LI 


; line 10 

; return(sum); 


Lll: 

ldd 

L31_sum 



rts 



; } 





.pub!ic 

_sum_of_n 



(c) Optimized, with C source interspersed. 


1 ; Compilateur C pour MC6809 (COSMIC-France) 


2 



.processor 

m6809 

3 



.psect 

data 

4 

0000 

0000 L3 n: 

. byte 

0,0 

5 

0002 

0000 L31_sum: 

. byte 

0,0 

6 ; 

unsigned int sum 

_of_n() 


7 ; 

{ 




8 



.psect 

_text 

9 ; 

static unsigned 

int n; 


10; 

static unsigned 

int sum; 


11; 

sum=0; 



12 

E000 

4F sum of n 

: cl ra 


13 

E001 

5F 

cl rb 


14 

E002 

FD0002 

std 

L31_sum 

15; 

line 

6; while(n>0) 



16 

E005 

BE0000 LI: 

ldx 

L3_n 

17; 

it it it •it it 

cmpx #0 



18 

E008 

2714 

jbeq 

Lll 

19; 


{ 



20; 


sum=sum+n; 



21 

E00A 

FC0002 

ldd 

L31 sum 

22 

E00D 

F30000 

addd 

L3_n 

23 

E010 

FD0002 

std 

L31_sum 

24; 


n=n-l; 



25 

E013 

FC0000 

ldd 

L3_n 

26 

E016 

C3FFFF 

addd 

#-l 

27 

E019 

FD0000 

std 

L3_n 

28; 


} 



29 

E01C 

20E7 

jbr 

LI 

30; 

line 

10; return(sum) 

J 


31 

E01E 

FC0002 Lll: 

ldd 

L31_sum 

32 

E021 

39 

rts 


33; 

} 




34 



.public 

_sum_of_n 

35 



. end 



(d) Object listing. 
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Table 7.14 (continued) Passing a simple program through the compiler of Fig. fZ5l 
: 2 OEOOOOO 4 F 5 FFDOOO 2 BEOOOO 2714 FCOOO 2 F 3 OOOOFDOOO 2 FCOOOOC 3 FFFFFDOOOO 2 OE 7 FCOOAD 
:02E020000239C3 
: 0400000000000000FC 
:00E000011F 

(e) Machine code. 

Research Paragon MCC68K Version 3 C cross-compiler has five optional optimiza¬ 
tions. 

Some optimizations can be dangerous in certain situations. The classical case 
involves a program testing the data in an external peripheral's Control regis¬ 
ter, perhaps the Transmit_Data_Register_Empty_flag in aUART. The compiler will 
translate this into a loop which repetitively brings in the Transmit Control regis¬ 
ter (TCR) state, checks the appropriate bit and repeats unless True. However, the 
optimizer may decide that once the variable is down in the MPU's Data register, 
then it is a shame to keep bringing it down each loop pass; why not bring it down 
just once before the loop is entered. Of course the optimizer does not realize 
that the variable TCR can be altered, seemingly spontaneously, by some agency 
outside the sphere of influence of the software. ANSII C has a type of variable 
known as vol ati 1 e, which warns optimizers to leave well alone. 

As an example, Table fTTT-fl a) shows a simple C program passed through the 
chain of Fig. 17.51 We will look at this program in more detail in the next chapter, 
but essentially it comprises a loop adding a decrementing 8-bit integer n to the 
16-bit integer variable sum (see Table 2.1). Assembly source code produced by the 
compiler's second pass in Table EUb) is sent through the optimizer, which re¬ 
moves or changes relevant instructions. These are normally shown as comments 
in the output listing. In Table [7T4l c). line 17 has been commented out from the 
original source code. The optimizer has recognized that the previous line, which 
loads the variable n into the X Index register, also sets the Z flag if n is zero. Thus 
the subsequent comparison with zero (to test the condition n > 0?) is redundant. 
If the while operand had been anything other than zero, for example (n > 1), 
then the generated instruction CMPX #1 would be valid. 

Not all compilers produce assembly-level code for assembly and linkage, al¬ 
though most Fortran and C translators do. A load-and-run compiler may directly 
generate machine code, load it and execute. 

The most common alternative to compilation is interpretation. In this situ¬ 
ation the source code is not translated but run as it is. An interpreter program 
must be resident with this source at run time, as it translates each line on the fly' 
and then executes it. This process is of course very slow compared to running a 
machine-code program directly. However, developing such programs is faster, as 
a recompilation is not necessary after each change in the source. For small pro¬ 
grams, an interpreter represents a large overhead, as it must be accommodated 
in target memory in tandem with the source. Nevertheless, high-level source code 
is more compact than its equivalent machine code and very large interpreted pro¬ 
grams may actually require less storage, even with the resident interpreter. The 
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BASIC language is usually run under an interpreter, although compilers are avail¬ 
able for this language and may be used once the interpreter-based development 
has been completed. C interpreters are also available as a development aid, but 
are rarely used. 

A compromise is sometimes effected, where a compiler produces an interme¬ 
diate code and at run time a much simplified interpreter executes' this code as 
its source. Pascal is traditionally used in this manner (the compiler producing 
p-code). 
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CHAPTER 8 


Naked C 


In the beginning there was CPL (Combined Programming Language), a language 
developed jointly by Cambridge and London universities in the mid 1960s. BCPL 
(Basic CPL) was a somewhat less complex but more efficient variant designed as a 
compiler-writing tool in the late 1960s At around that time, Bell System Lab¬ 
oratories were working on the UNIX operating system for their DEC PDP series of 
minicomputers. Early versions of UNIX were written in assembly language 0. In 
an attempt to promote the spread of this operating system to different hardware 
environments, some work was done with the aim of rewriting UNIX in a portable 
language. The language B j3], which was essentially BCPL with a different syntax 
(and was named after the first letter of that language), was developed for that 
purpose in 1970 ID, initially targeted to the PDP-11 minicomputer. 

Both BCPL and B used only one type of object, the integer machine word (16 bits 
for the PDP-11). This typeless structure led to difficulties in dealing with indi¬ 
vidual bytes and floating-point computation. C (the second letter of BCPL) was 
developed in 1972 to address this problem, by creating a range of objects of both 
integer and floating-point types. This enhanced its portability and flexibility. 
UNIX was reworked in C during the summer of 1973, comprising around 10,000 
lines of high-level code and 1000 lines at assembly level QQ. It occupied some 
30% more storage than the original version. 

Although C has been closely associated with UNIX, over the intervening years 
it has escaped to appear in compilers running under virtually every known op¬ 
erating system, and targeted to mainframe CPUs down to single-chip microcon¬ 
trollers. Furthermore, although originally a systems progra mm ing language, it is 
now used to write applications programs ranging from CAD down to the intelli¬ 
gence behind mi crowave ovens and smart egg-timers! 

For over ten years, the official definition of C was the first edition of The C Pro¬ 
gramming Language, written by the language's originators Brian W. Kernighan 
and Dermis M. Ritchie El. It is a tribute to the power and simplicity of the lan¬ 
guage, that over the years it has survived virtually intact, resisting the tendency 
to split into dialects and new versions. In 1983 the Am erican National Standards 
Institute (ANSI) established the X3J11 committee to provide a modern and com¬ 
prehensive definition of C to reflect the enhanced role of this language. It took 
nearly a decade to finally approve the resulting definition, known as Standard or 
ANSII C. 

In the event, the philosophy of the standard was to alter Old C as little as 
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possible, and that such changes should allow existing programs to compile with, 
at most, minor changes. The two major changes were the tightening up of the 
syntax for declaring and defining functions, so that the compiler can report er¬ 
rors due to mismatched arguments. The original specification did not define 
the libraries accompanying C; although many such functions became de facto 
standards, there were many portability problems. ANSI C has a standard library, 
which is a specified part of the language running in a hosted environment; that 
is with an operating system in situ. 

Within the scope of this book, it is impossible to do more than survey the 
elements of programming in C. There are many excellent texts devoted entirely 
to this end, some of which are listed at the end of this chapter [71 151151 [TUI |11| . To 
reduce the size of this summary, aspects of C which are unlikely to be of interest 
to non-hosted environments, that is naked MPU-based systems, have been omit¬ 
ted, for example, file and terminal I/O functions. In addition I have concentrated 
on the newer ANSI C language, which I have given the generic term C. Where the 
original specification is alluded to, the term old C has been used. At the time of 
writing, virtually all compilers are implementing ANSI C. 


8.1 A Tutorial Introduction 

Let us begin by taking a simple but functional C program and dissect it line by 
line. 


Table 8.1 Definition of function sum_of_nQ. 


Function sums all integers up to n (maximum 65,535) 
ENTRY : n passed as unsigned short 
EXIT : sum returned as unsigned int 


unsigned int sum_of_n(unsigned short int n) 
{ 


8 unsigned int sum; 

9 sum=0; 


10 

while(n>0) 

/* 

For 

as long as n is greater than zero 

V 

11 

{ 





12 

sum=sum+n; 

/* 

add 

n to the sum 

V 

13 

n=n-l; 

/* 

and 

decrement n 

V 

14 

} 





15 

return(sum); 





16 

} 






The program itself, shown in Table 18.11 is a slightly modified version of Ta¬ 
ble I7.141 al. It is written in the form of a subroutine (known in C as a function) 
with the variable n being passed to it from the calling program, and sum being 
returned to the caller. The algorithm continually adds n to the initially cleared 
sum, as n is decremented to zero. 
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1-5: These five lines are comments. Any characters between delimiters /* and */ 
are regarded as a single space by the compiler. Comments can be anywhere 
where whitespace (the collective term for blank, tab or newline) can appear. 
Thus lines 10, 12 and 13 have comments after the executable part. Generally, 
whitespace is used as a matter of style to make the code easier to read. The 
language itself is entirely freeform, provided that the various statements etc. 
can be distinguished by the preprocessor. 

6: This line names the function sum_of_n and declares that it returns an un¬ 
signed integer value and acts on an unsigned short integer variable n passed 
to it by the calling program. Setting out the function parameters like this is 
known as prototyping. Objects of type i nt mean that such variables have 
fixed-point (as opposed to real floating-point) values. A short integer size is 
typically 16 bits whilst a plain integer is typically 32 bits (see Fig. |8.3| l. Here 
they are to be treated as unsigned numbers. 

7: A left brace thus { is equivalent to begin in Pascal. All begins must be 
matched with an end, or in C a right brace }. It is good programming style 
to indent each begi n from the column of the immediately preceding line(s) 
and to ensure that begi n and end braces line up. In this case line 16 is the 
corresponding end brace. Between lines 7 and 16 is the body of the function 
sum_of_n(). 

8: There is only one variable which is local to our function. Its name and type 
are defined here. In C all variables (unless external) must be defined before 
they are used. Conventionally, all variables are defined at the beginning of the 
function. A definition tells the compiler what properties the named variable 
has, for example size, so that it can allocate suitable storage. Several variables 
of the same type may be defined in the one statement, for example: 
int varl, var2, varB; 

The line is terminated by a semicolon ; as are all statements in C. 

9: Here we assign (=) the value 0 to the variable sum, that is clear it. A definition 
and an initializing assignment can frequently be combined; thus: 

unsigned int sum = 0; 

is a legitimate statement combining lines 8 and 9. 

10: In evaluating sum we need to repeat the same process for as long as n is greater 
than zero. This is the purpose of the whi 1 e loop introduced in this line. The 
body of this loop, that is the statements which appear between the following 
left and right braces of lines 11 and 14, is continually executed for as long 
as the expression in the parentheses evaluates to non-zero (True in C). This 
test is done before each pass through the body. Thus in our example, on 
entry the expression n > 0 is evaluated. If True, then n is added to sum, n is 
decremented, and the loop test repeated. In this case, eventually n reaches 
zero. Then the expression n > 0 evaluates to False (zero), and the statement 
following the closing brace is entered (line 15). 

An alternative is whi 1 e(n), which will also terminate when n reaches zero 
(False). This is si mi lar to the difference at assembly level between the Test and 
Compare operations. 
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11: The begin brace defining the whi 1 e body. Notice that for style it is indented. 

1 2: The right expression to the assignment is evaluated, sum + n, and the result¬ 
ing value (r_value) given to the left variable (Lvalue), sum. The expression 
sum += n; in C is equivalent and means increment sum by n. 

1 3: Here one is subtracted from n and the result becomes the new n. The decre¬ 
ment operator -- can also be used giving the expression n-- ;. 

14: The end brace. Notice the style with the opening brace (in line 7) and clos¬ 
ing brace vertically aligned. This reduces the chance of an error in complex 
expressions. Braces are used to surround compound statements; that is se¬ 
quences of single statements. Such blocks can be treated in exactly the way 
a single statement is dealt with. Except where they surround the body of a 
function, braces may be omitted when the block has only one statement (a 
simple statement). In our example, lines 9-14 could be replaced by: 

while(n) sum+=n--; 

which reads: while n is non-zero, add n to sum and decrement n. C can be 
written in a terse style like this, but the result can be difficult to read. The 
style used in this book would be: 

whi1e(n) 

{sum += n--;} 

where I have used (the optional) braces for clarity. 

1 5: Only one value can be returned directly from a function and this is specified 
by the return operator. The type of this parameter was given in the prefix of 
the function declared in the prototype of line 6. The value of the function is 
the value of this variable. Thus if we had a function that returned the square 
root of a constant passed to it, then the expression in the calling function: 

x = sqr_root(y); 

would assign the value of sqr_root(y), that is its returned value, to x. 

16: The closing brace for function sum_of_n(). 

At a simple level, our dissection has given us a feeling for the basic architecture 
of a C function. C programs are normally structured in a modular fashion with a 
central function, conventionally named mai n(), calling up a series of functions, 
some of which may be from a library. Functions can of course be nested. This 
structure is shown in Fig. 18.11 

We will spend the remainder of this and the next chapter exploring the basic 
concepts informally introduced here, and enlarging our repertoire of C operations 
and constructions. 


8.2 Variables and Constants 

C has a rich variety of elementary objects, upon which are based the more com¬ 
plex groupings of strings, arrays and structures. Objects have properties such 
as size, structure, range and scope, some of which are summarized in Fig. 18.21 
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Main routine 



etc 


Figure 8.1 Structure of C programs. 

We will discuss these properties at some length in this section, except for scope 
which is deferred until Section 9.1. 

Simple objects are based on a fixed set of basic types, which are illustrated 
in Fig. 18.SI The fundamental division is between real and integer forms. The 
former are valued in terms of floating-point numbers, with sign, magnitude and 
exponent parts. Three real types are specified, namely float, double float and 
long double float. C does not guarantee that the three types will in any given 
implementation differ in precision, only that a double float object will never 
be of lower precision than a plain float equivalent, and similarly that a long 
doubl e f 1 oat will never be of lower precision than a doubl e f 1 oat equivalent. 
The actual format is implementation dependent, but typically conforms to the 
ANSI Standard 754-1985 H2 shown in Fig. [Q1 

Most microprocessor-target implementations treat long double objects as 
the default double. Some also permit the optional situation where all real types 
are treated as single-precision float objects. This gives faster processing at the 
expense of precision, especially when real operations are not implemented in a 
mathematics co-processor. Even when only one or two precisions are actually im¬ 
plemented, it is not considered an error to declare an object of an unimplemented 
size. For example, where an implementation only supports a single- and double- 
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Figure 8.2 Properties of simple object types. 

precision format, an object declared long double float is represented in the 
double precision format, the best possible, doubl e f 1 oat can be abbreviated to 
just double, likewise 1 ong double float to long double. 

Four integer types representing objects in a fixed-point format are also spec¬ 
ified. They are in rising order of range: char, short int, int and long int. 
Their range is implementation dependent, and typically only three actual sizes 
are supported. The char type is nearly always an 8-bit byte, and is named for 
an object just big enough to hold a single character (typically, but not always, 
ASCII-coded). However, the char type object is a true integer and can be used 
to define a basic (byte) unit of memory or 8-bit I/O port. A plain int object is 
supposed to be the size most comfortably handled by the target processor, but 
is guaranteed to never be less than 16 bits wide. Typically it is 16 bits for an 
8-bit MPU and 32 bits for 16/32-bit devices. Some compilers permit the option of 
either size, i nt objects are signed, that is both positive and negative magnitudes 
are represented. A 2's complement representation is usual for MPU targets, but 
others are in use. 

A short int object is warranted never to be of a greater size than a plain int 
and conversely a long int is never smaller than i nt. Normally with MPU targets 
there are only two distinct sizes, with short being 16 bits and 1 ong being 32 bits. 
The i nt size maybe either, short and 1 ong i nt are also signed representations, 
covering the range -2 (M ~ 1) to +2 (n_1) - 1 in a 2’s complement implementation, 
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Figure 8.3 Basic set of C data types. 


where n is the number of bits. The qualifier short i nt can be shortened to just 
short; similarly long int shortens to long. 

Although all i nt types default to a signed representation, they maybe prefixed 
by the (redundant) qualifier signed for clarity. The unsigned qualifier can be 
used to give an object which is positive only, and covering the range 0 to 2 n - 1. 






206 C FOR THE MICROPROCESSOR ENGINEER 


Thus the declaration: 
unsigned long int sum; 

defines an object called sum which can range from 0 to 4,294,967,295 (assuming 
4-byte size). 

char types are not guaranteed either way. The qualifier si gned or unsi gned 
must be used if such objects are reliably to partake in mathematical operations 
with other integer types, see Section 8.4. 

A void object does not exist and does not (naturally) take up any space. It is 
normally used to declare that a function does not return a variable back to the 
caller or that no variable is passed by the caller to the function. This is illustrated 
in Section 9.1. voi d could properly be said to be a pseudo type. 

One of the major properties of an object is where it will be stored: in a register, 
in an absolute memory location or in a relative memory location in a stack-based 
frame. The programmer can use the qualifiers register, static or auto to 
declare which storage class the named variable is to be assigned. 

In order to illustrate how these high-level attributes map down to assembly 
level, I have compiled three versions of a slightly modified version of the sum- 
of-integers C program of Table. [8711 The output of this compiler is shown in 
Table 18.21 This shows the original C source code statements as co mm ents (the 
syntax of the assembler uses a * to denote a comment) interspersed with their 
resulting assembly-level instructions. I have manually added comments in paren¬ 
theses to clarify what is happening at this assembly level; the compiler cannot 
do this. 

By default, all variables are automatically assigned locations in a frame when 
they are defined on entry to the function in which they operate. They are then 
accessed relative to the Frame Pointer, as illustrated in Figs l5.6l and l5.7l This is 
the situation shown in Table I8.2l al where the variables are declared type auto 
(line C3). The Frame is made eight deep (LINK A6,#-8) as described in Sec¬ 
tion 5.2. The 4-byte variable n is located at A6-4: 3:2:1, and can be fetched 
using MOVE. L -4 (A6) , D7, where A6 is the Frame Pointer. Similarly the resulting 
4-byte sum is located at A6-8: 7:6: 5, hence the operation MOVE. L D7, -8 (A6). 
Although the qualifier auto is shown in this example, it is usually omitted as the 
default. 

Once a function or compound statement has been completed, the frame for 
any internally defined variables (that is variables defined inside the braces) is 
closed and its contents lost. Thus if that code is re-entered at some time in the 
future, no sensible use can be made of an auto variable's previous incarnation. 
Thus an auto variable's lifetime is simply from where it is defined to its corre¬ 
sponding closing brace. It is unknown outside this region, that is its scope is 
local to that of the braces within which it was defined. 

A static variable is permanently allocated storage, rather than residing inside 
a transient frame. Thus in Table 1872T b). both variables n and sum are given room 
in the data program section (.data is the directive used for the Whitesmith's 
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assembler as equivalent to . psect _data, and similarly . text for the text sec¬ 
tion), and the compiler names them L3_n and L31_sum respectively. Now to 
fetch n we have the instruction MOVE. L L3_n,D7. Similarly, to update sum we 
have MOVE. L D7, L31_sum. Both L3_n and L31_sum of course translate to abso- 


Table l872l Variable storage class (continued next page). 


* 1 sum_of_n() 

2 { 

. text 
. even 

_sum_of_n: link a6,#-8 

* 3 auto unsigned int n,sum; 

* 4 sum=0; 

clr.l -8(a6) 

* 5 while(n>0) 

LI: tst.l -4(a6) 

beq.s Lll 

* 6 {sum=sum+n--;} 

move.l -4(a6),al 

subq.l #l,-4(a6) 

move.l al,d7 

add.l d7,-8(a6) 

bra.s LI 

* 7 return(sum); 

Lll: move.l -8(a6),d7 

unlk a6 

rts 

.globl _sum_of_n 

8 } 


(Open frame for n and sum ) 


(sum in -8(a6) =0 ) 

(n in -4(a6) checked for =0? ) 

(Exit if true ) 

(Get n ) 

(Decrement it in its lair ) 

(Move old n to d7 ) 

(Add old n to sum ) 

(and repeat ) 

(Sum returned in d7 ) 


(a) Variables stored in relative memory (38 bytes). 


. data 
. even 

L3_n: .byte 0,0,0,0 (4 bytes for n at L3_n 

. even 

L31_sum: .byte 0,0,0,0 (and 4 for sum at L31_sum 

* 1 sum_of_nQ 

2 { 


. text 
. even 

* 3 static unsigned int n,sum; 

* 4 sum=0; 

_sum_of_n: clr.l L31_sum 

5 while(n>0) 

LI: tst.l L3_n 

beq.s Lll 

* 6 {sum=sum+n--;} 

move.l L3_n,al 

subq.l #l,L3_n 

move.l al,d7 

add.l d7,L31_sum 
bra.s LI 

* 7 return(sum); 

Lll: move.l L31_sum,d7 

rts 

.globl _sum_of_n 

8 } 


(sum = 0 

(Check for n=0? 

(Exit if true 

(Get n 

(Decrement it in its lair 
(Move old n to d7 
(Add old n to sum 
(and repeat 

(sum returned in d7 


) 

) 


) 

) 

) 

) 

) 

) 

) 

) 

) 


(b) Variables stored in absolute memory (44 bytes). 
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Table 8.2 (continued! Variable storage class. 

1 sum_of_n() 

2 { 

. text 
. even 

,sum_of_n: movem.l d5/d4,-(sp) (Save used registers ) 

3 register unsigned int n,sum; 


* 4 sum=0; 

moveq.l #0,d4 

* 5 while(n>0) 

LI: tst.l d5 

beq.s Lll 

* 6 {sum=sum+n--;} 

move.l d5,d7 
subq.l #l,d5 
add.l d7,d4 
bra.s LI 

* 7 return(sum); 

Lll: move.l d4,d7 

movem.l (sp)+,d5/d4 
rts 

.globl _sum_of_n 

* 8 } 


(Sum in d4.1 ) 

(Check for n=0? ) 

(Exit if true ) 

(move n to d7 ) 

(Decrement it in its lair ) 

(Add old n to sum ) 

(and repeat ) 

(Sum returned in d7 ) 

(Restore all used regs ) 


(c) Variables stored in registers (26 bytes) 


lute addresses after linking. Absolutely located variables usually take longer to 
fetch and return to memory as opposed to stack-based (i.e. auto) storage. 

Internally defined static variables have the same scope as auto variables, that 
is they are local to the function or compound statement in which they are defined. 
Their lifetime is however that of the program run. Thus if the code is re-entered, 
the last value of that stati c variable will still be known, stati c variables can 
be declared outside a function, in which case they are globally known from their 
definition point onwards. This will be discussed in Section 9.1. 

Variables have to be brought down to a register to be processed, and then re¬ 
turned to their abode in memory (either to a fixed or relative address) afterwards. 
All these toings and froings are time consuming and take up program space. In 
processors with a copious supply of registers, some can be reserved to keep vari¬ 
ables in situ for longer periods. This is especially valuable in a loop situation 
where, otherwise, variables would have to be continually swapped in and out of 
memory. 

The programmer can designate any number of auto variables as candidates 
for register storage, by using the keyword register. The compiler does not have 
to take any notice of this, and if ignored, such variables are treated as auto types. 
The Whitesmith 68020 C cross-compiler V3.2, used to generate the code shown 
in Table l872T c). reserves three Data and three Address registers for this purpose. 
Such register variables are widened to 32 bits (i nt) when fitted into the desig¬ 
nated register. Floating-point variables cannot be designated register types, but 
pointers (addresses) of such objects can. The scope and lifetime of register vari- 
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ables is identical to that of auto types; indeed they behave in an equivalent way 
to these, except that their address cannot be taken (see Section 9.2). 

Variables can be given a valu e at any time by simple assignment, for example 
sum = 0, in line C4 of Table [8?2l It is possible to initialize a variable at the time 
of its definition; thus we may have: 

int x=5, y=10, z=-3; 

defining x as a (signed) integer with an initial value of +5, y likewise at +10 and z 
starting off life as -3. How this is done at machine code level, and the resulting 
effects at high level, depends on whether the variable has a permanent storage 
location (i.e. is stati c) or temporary (i.e. auto or regi ster). 

Variables that are stati c as viewed from the high-level perspective are given 
their initial value before the program begins execution. This is obvious when the 
assembly code of a static initializing definition is examined, as shown in Ta- 
ble |8.31 a). Each stati c variable has its location reserved for it in data space with 
the constant in situ, by using a . BYTE (or DC) directive. Thus when the program 
is put into memory prior to execution by using a loader, these constants will be 
placed at the appropriate addresses. Loading is a one-off procedure, and no mat¬ 
ter how often the definition code is executed, the contents of these locations will 
not be re-initialized. 

Notice from the listing that c has been given an initial value of zero. The 
language specification guarantees that all uninitialized stati c variables will be 
zero (see also lines 4 and 5 of Table [7T4l d)). 

Relying on initial static variable states is dangerous when ROMable C code 
(that is code destined to be located in ROM) is executed. This is because there 
is no loading action before execution, the program being permanently stored 
in ROM. Data in RAM will be garbage, as the power-up state for such memory is 
unspecified. This is discussed further in Section 10.3. 

Variables that are auto or regi ste r can be initialized in their definition using 
the same syntax, but the effects are very different from the previous situation. As 
can be seen from Table [873T b) such a definition leads to executable code identical 
to that produced by the sequence: 

auto int a, b, c; 
a = 5; 
b = 23; 

Such code is executed at each pass through the function; that is the constants a 
and b are re-initialized each time, auto and register variables are said to be 
initialized at run time, as opposed to stati c variables which get their primary 
values at load time. Uninitialized auto and register variables have no pre¬ 
dictable value, as their locations will either hold the random power-up state of 
volatile memory or a value generated by some other code which used the same 
locations previously. 

The const and vol ati 1 e type modifiers are new to ANSII C 1*131 . An object 
declared const must not be changed by the compiler subsequent to any optional 
pre-initialization. Code such as: 
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Table 8.3 Initializing variables. 

.processor m6809 



.psect 

_data 


L3_a: 

.word 

5 

a is predefined as 5 

L31_b: 

.word 

23 

b is predefined as 23 

L32_c: 

. byte 

0,0 

No explicit initialization gives an initial 

; 1 

mai n() 



; 2 

{ 




.psect 

_text 


_mai n: 




; 3 

static 

int a=5, 

b=23, c; 

; 4 

c=a+b; 




ldd 

L3_a 

Get a 


addd 

L31_b 

Add b 


std 

L32_c 

= c 

; 5 

} 




rts 




. publ i c 
. end 

_mai n 



(a) Compile-time initialization, 
.processor m6809 



.psect 

_text 


; l 

mai nO 



; 2 

{ 



_mai n: 

pshs 

u 

Open frame 


1 eau 

, s 



leas 

-6, s 

Six deep 

; 3 

auto int 

a=5, b= 

23, c; 


ldd 

#5 

Make a 5 


std 

-2, u 



ldd 

#23 

Make b 23 


std 

-4, u 


; 4 

c=a+b; 




ldd 

-2, u 

Get a 


addd 

-4,u 

Add b 


std 

-6, u 

= c 

; 5 

} 




1 eas 

,u 

Close frame 


puls 

u,pc 



. publ i c 
. end 

_mai n 



(b) Run-time initialization. 


int a, b; 
const int c; 
c = a + b; 

will be flagged by the compiler as erroneous. 

The const qualifier is particularly useful in generating fixed look-up tables 
and arrays. Objects declared both static and const are normally placed by 
the compiler in the same program section as text. In a dedicated system, this is 
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usually in ROM. 

The const modifier can also be used together with the volatile modifier to 
declare a peripheral register as read-only. Specifically the volatile qualifier 
warns the compiler that the specified variable may be altered by some outside 
agency not known by the program; that is its value is subject to spontaneous and 
random change. Thus for example, an input port will reflect an external event 
not under the program's control. Also the compiler should never try to modify 
an input port's contents; it is read-only. 

The classical example is monitoring a bit in a Status register, waiting for an 
event to happen, for example: 


unsigned char i; 
volatile unsigned char status; 
const volatile unsigned char in_port; 
while (!(status & 0x80)) 

{;} 

i = port; 


/* i is an ordinary variable 
/* status is the Status register 
/* in_port is the read-only input 
/* As long as bit 7 is False (0) 

/* Do this (a null statement) 

/* When bit 7 is True (1), read in_port 


Here the Status register is continually ANDed (&, the bitwise AND operator) with 
the mask 1 0000000b (80b = 0x80). If bit 7 (the flag which says an event has hap¬ 
pened out there) is 0 or False, then the expression ! (status & 0x80) returns 
! (False). As ! is the logic NOT operator, this yields NOT False or True, and 
the body of the while construction is executed. The single ; statement termi¬ 
nator is used to give a null body. When bit 7 is high ! (status & 0x80) returns 
! (True) = Fal se and the polling terminates. 

If the volatile qualifier is not used, then the compiler may well optimize 
the situation by reading in status to a register. The compiler will then contin¬ 
ually test this copy, to save regularly bringing it down into the MPU. This would 
only make sense if status were an ordinary variable whose value could only 
be changed by the compiler. Note that the port peripheral register has been 
declared both const and vol ati 1 e. This means it is a read-only object (the com¬ 
piler should not try and alter it) and its value can only be modified by an outside 
agency. The object descriptor pair of modifiers unsigned char are the equiva¬ 
lent to saying that the qualified object is byte sized, as illustrated below. Another 
example of vol ati 1 e is given in the listing of TableNormally objects of this 
kind are pointers to (i.e. addresses of) hardware ports or fixed memory locations, 
rather than the objects themselves. We will discuss pointers in Section 9.2. 


read-only 

externally alterable 
byte sized 
name of object 


const volatile unsigned char in_port 

Object identifiers (that is their names) can be any combination of case sensi¬ 
tive alphanumerics and the underscore character _. The first character must not 
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be a number. With one exception, the initial 31 characters are guaranteed to be 
significant. Longer identifiers can be used, but the additional characters may be 
ignored; for example the two variables: 

base_emitter_bias_resistor_R1150 
base_emitter_bias_resistor_R1159 

may be treated as the same. 

The exception is variables which are declared extern. These are not defined 
anywhere in the program but are assumed to be added later through the linker, 
for example calls to library functions. As the C specification does not include 
the linker, only the first six characters can be relied on to be significant, and case 
distinctions may be ignored for such variables. External variables are discussed 
in Section 9.1. 

Integer constants can be written in decimal, hexadecimal or octal. Thus the 
statements: 

i = 255; /* Decimal */ 

i = OxFF; /* Hexadecimal */ 

i = 0377; /* Octal */ 

are identical, with the Ox prefix indicating hexadecimal and 0 for octal. Be careful, 
377 (decimal) and 0377 (octal) are very different in C. Character constants are 
indicated by single quotes; thus: 

i = 'a' /* Same as i = 0x61; if ASCII */ 

See lines 4 and 5 of Table [831 for another example. 

Constants are normally regarded by the compiler as plain integer, or if floating¬ 
point, then double precision. The default type can be overridden by using the 
suffix L for a long integer or long double floating-point, U for an unsigned inte¬ 
ger, UL for an unsigned long integer and (F) for a single-precision floating-point 
constant. 

If an integer constant is too large to fit into a plain integer size, then the 
compiler will promote it according to the rules: 

int — long — unsigned long (if plain decimal) 

int — unsigned int — long — unsigned long (if hex or octal) 

Similarly real constants may be promoted to long double precision. 

All integer constants are regarded as positive, with the assignment i = -255; 
being treated as a positive constant of 255, modified by the unary operator - 
(minus). 

Some typical floating-point assignments are: 

i = 255. 
i = 255.0 
i = 2.55E2 
i = . 0255E+4 
i = 2550.E-l 


/* Note use of decimal point */ 
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which are all identical. 


8.3 Operators, Expressions and Statements 

We have already casually introduced several C operators without much discus¬ 
sion. These were + for Addition, - for Subtraction, * for Multiplication, / for 
Division, ++ and -- for Increment and Decrement, & for bitwise AND, ! for logic 
NOT and () to pass parameters to a function. In all, there are 45 defined opera¬ 
tors in C as listed in Table l8~4l 

Operators are used to combine operands into expressions, for example: 
x + y * z 

Here we have a problem, will z be multiplied by the sum of x and y, or will x be 
added to the product of y and z? The outcome will differ considerably between 
the two cases. Parentheses can be used to force the way an expression is put 
together; thus we can have: 

x + (y * z) 

(x + y) * z 

as C follows the usual rules of computing the contents of parentheses first (in¬ 
nermost outwards for nested parentheses). 

The way in which an expression is combined is obviously of critical impor¬ 
tance. In C, operators are graded in order of their precedence. T able [hT-TI which 
lists operators in descending order of precedence, shows that multiplication is 
of a higher precedence than addition, and so will be implemented first. Thus 
the first form of the parenthesized expression above is equivalent to our original 
statement. 

This still leaves us with the problem of mixing operators at the same level of 
precedence. For example: 

x/y/z 

Is this (x/y)/z or x/(y/z)? The outcomes are very different. Most operators 
associate from left to right, thus the equivalent here is: 

(x/y)/z 

An example of right to left association is the statement: 
f=x=y=z=0; 

What value will f have? The answer here is 0, as assignment operators associate 
from right to left. Firstly z will be assigned to 0, then y to z (i.e. 0), then x to y 
(i.e. 0) then f to x (i.e. 0); so all variables will be set to 0, that is: 


f = (x = (y = (z = 0))); 
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Table [8~4l C operators, their precedence and associativity (continued next page). 


Operator 

Operation 

Example 

Top priority 

Direction (associativity) => 

0 

Function call 

sqr() 

[] 

Array element 

x [6] 


Structure element 

PIA1.CRA 

-> 

Structure element using a pointer 


Unary operators 

Direction (associativity) <= 

! 

Logical NOT 

! x 

~ 

Inversion (l's complement) 

~x 

- 

Negative 

y=-x 

+ 

Unary plus 

y=x- +(y+z) 

++ 

Increment 

x++ or ++x 

-- 

Decrement 

x-- or --x 

& 

Address of 

&x 

* 

Contents of address 

*address 

(type) 

Cast 

(1ong)x 

sizeof 

Size of object in bytes 

sizeof x 

Arithmetic 

Direction (associativity) => 

* 

Multiplication 

z=x*y 

/ 

Division 

z=x/y 

% 

Remainder 

z=x%y (Integer types only) 

+ 

Addition 

z=x+y 

- 

Subtraction 

z=x-y 

Shift 


Integer types only 

Direction (associativity) => 

» 

Shift left 

z=x»3 

« 

Shift right 

z=x«3 

Relational operators 

Boolean objects 

Direction (associativity) => 

< 

Less than 

while (x<3 ) 

<= 

Less than or equal 

while (x<=3) 

> 

Greater than 

while (x>3 ) 

> = 

Greater than or equal 

while (x>=3) 

== 

Equivalent 

while (x==y) 

J = 

Not equivalent 

while (x!=0) 
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Table 8.4 (continued! C operators, their precedence and associativity. 


Operator 

Operation 

Example 

Bitwise logic 

Integer types only 

Direction (associativity) => 

& 

AND 

x&OxFE (Clear bit 0) 

A 

Exclusive-OR 

xAOxOl (Toggle bit 0) 

l 

OR 

x 10x01 (Set bit 0) 

Objectwise logic 

Boolean objects 

Direction (associativity) => 

&& 

Logical AND 

x&&y is True if both x and y are True 

1 1 

Logical OR 

x | [ y is True if both or either x and y are True 

? : 

Conditional 

x=(y>z)?5 :10 x=5 if y>z True else x=10 

Assignment 

Direction (associativity) <= 

= 

Simple 

x=3 

+= 

Compound plus 

x+=3 e.g. (x=x+3) 

-= 

Compound minus 

x-=3 e.g. (x=x-3) 

*= 

Compound multiply 

x*=3 e.g. (x=x*3) 

/= 

Compound divide 

x/=3 e.g. (x=x/3) 

%= 

Compound remainder 

x%=3 e.g. (x=x%3) 

&= 

Compound bit AND 

x&=3 e.g. (x=x&3) 

A= 

Compound bit EX-OR 

xA=3 e.g. (x=xA3) 

1 = 

Compound bit OR 

x | =3 e.g. (x=x| 3) 

«= 

Compound shift left 

x«=3 e.g. (x=x«3) 

»= 

Compound shift right 

x»=3 e.g. (x=x»3) 

Direction (associativity) =s> 

J 

Concatenate 

if(x=0,y=3;x<10,x++) 

Lowest priority 


We have illustrated this situation using a statement as opposed to an expres¬ 
sion. C programs are made up of a series of statements or actions, each ter¬ 
minated by a semicolon ;. The majority of these are expression statements, 
where combinations of variables and constants are linked by one or more op¬ 
erators. Compound statements may be formed by enclosing a series of simple 
statements in braces, as shown in Table lSJl lines 11-14. Anywhere it is legitimate 
for a simple statement to appear, can be filled by a complex statement. 
Expressions have values, for example in the peculiar looking statement: 

x = 12 * (y = z + 5) + 2; 

the expression y = z + 5 assigns the value z + 5 to y and takes this outcome 
































216 C FOR THE MICROPROCESSOR ENGINEER 


for itself. Thus it is equivalent to the compound statement: 

{ 

y = z + 5; 
x = 12 * y + 2; 

} 

Most of the operators listed in Table f8~4l are intuitive and will not be covered 
in any detail here. Apart from functions, arrays and structures, discussions of 
which are deferred to Chapter 9, the unary operators have the highest priority. 
Unary operators attach to a single object, for example ~x inverts all bits in x 
(l's complement). Most operators are binary in that they connect two objects, 
for example x + y. Unaries bind very tightly to their object due to their high 
priority; thus: 

a = b + ~x; 

and 

a = b + (~x) ; 

are the same, as ~ has a higher priority than the binary addition operator. 

Care must be taken when inverting C objects, as all zeros implicit in the vari¬ 
able become ones. Consider: 

inti = 0xA9, j; 

j = ~i; /* j = 0xFF56 or 0xFFFF56 */ 

j = ~i & OxFF; /* j = 0x0056 or 0x000056 */ 

Although i is assigned constant 0xA9, its bit pattern will be 0000 0000 1010 
1 001 b or 0000 0000 0000 0000 0000 0000 1010 1 001 b, depending on whether 
int is 16 or 32 bits. On inversion, all the implicit zero bits will become one 
as shown. Bit ANDing (the && operator) by 1 1 1 1 1111 b will clear these, as this 
i nt constant has implicit leading digits of zero. & has a lower priority than ~, so 
no parentheses are required. In a 2's complement machine the unary - operator 
acts in a si mi lar way to ~, that is -a is the same as ~a + 1 (2's complement is 
invert plus 1). As you would expect, unary - simply changes the sign of the object. 
Consider the statement: 

f = a + (b-c); 

You might think that the expression (b-c) would be evaluated first and then a 
added to it. In fact C will ignore the parentheses, deeming them unnecessary, 
as the binary addition (+) and subtraction (-) operators have the same level of 
priority. Then, according to the table, evaluation occurs from left to right; that 
is a + b and then -c. If it is important to you to add (b-c) to a and not b alone 
(perhaps because you are afraid of overflow) then the unary + operator will ensure 
this happens; that is: 


f = a + +(b-c); 
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Unary + forces evaluation of its operand, as this has a higher priority than the 
binary + Addition operator (see Table l8~4l >. 

Although C guarantees the way an expression is put together according to the 
rules of precedence and associativity, it says nothing about the sequence in which 
component sub-expressions are produced. Consider the following (convoluted) 
statement: 

f = a + (z = z+4) + 3*z; 

where the writer hoped that the parentheses would force the variable z to be 
incremented by four first, then multiplied by 3 and finally, left to right, a added 
to the new value of z and then added to three times the new value of z. But 
what if the compiler took it into its head firstly to multiply z by three (i.e. old z) 
and store the answer away somewhere, then evaluate (z+4) and store it away, 
and then add a to the new value of z plus three times the old value! This type 
of occurrence is known as a side effect, as it is usually caused by using an as¬ 
signment, increment, decrement or function that changes the value of an object 
that appears elsewhere in the expression. C makes no promises that side effects 
will occur in a predictable order within a single statement fl4l . A safer sequence 
would be: 

z = z + 4; 
f = a + z + 3*z; 

or 

f = a + 4*(z + 4); 

Unary operators normally tag their object to the left, the possible exception 
being the Increment ++ and Decrement -- unaries. These can be before or after 
the identifier; their effect being subtly different. A left Increment/Decrement 
unary operator means first change the object and then use it. A right unary 
means first use the (old) value in the calculations and then change the object. For 
example: 

sum = sum + n--; /* Add n to sum, then decrement n */ 

sum = sum + --n; /* Decrement n first, then add n to sum */ 

The former is clearly shown in line C6 of Table |8.?I b). First n is fetched from 
memory into internal storage (MOVE.L L3_n,Al), then the original object out 
there in memory is decremented (SUBQ. L #1, L3_n). Finally the original value is 
used for the addition (MOVE . L A1 , D7; ADD . L d7, L31_sum). 

Because of side effects, care must be taken that Incremented/Decremented 
objects do not appear elsewhere in the same statement; for example: 

z = 6*n-- + a/n; 

Will the n used in the denominator have the old or new value? You will be at the 
mercy of the vagaries of your compiler in writing such code. 

Rather confusingly, some of the unary operators have the same symbols as 
binary operators, with very different meanings; particularly address of (&) and 
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Table 8.5 Bitwise AND and Shift operations. 

. text 
. even 


link 

a6,#-4 

Make frame 

unsigned char packed_BCD, BCD_L0W, BCD_HIGH; 

BCD_L0W= 

(packed_BCD & 0x0f)+'0'; 

moveq.1 

#15,d7 

[D7] = OOOOOOFFh 

moveq.1 

#0, d6 


move.b 

-l(a6),d6 

[D6] = 000000[packed_BCD] 

and. 1 

d6,d7 

[D7] = 000000[packed_BCD&000000FFh] 

moveq.1 

#48,d6 

[D6] = 30h; that is 'O' 

add. 1 

d7,d6 

[D7] = 000000[packed_BCD&000000FFh 

move.b 

d6,-2(a6) 

Assigned to BCD_L0W 

BCD_HICH 

=(packed_BCD » 4)+'0'; 

moveq.1 

#0, d7 

[D7] = OOOOOOOOh 

move.b 

-l(a6),d7 

[D7] = 000000[packed_BCD] 

asr.l 

#4,d7 

[D7] = 000000[packed_BCD] » 4 

moveq.1 

#48,d6 

Once again add 'O' 

add. 1 

d7,d6 


move.b 

d6,-3(a6) 

Assigned to BCD_HIGH 

uni k 

a6 

Close frame 

rts 




contents of (*)■ The compiler normally has no difficulty distinguishing be¬ 
tween the unary and binary from their context (see Section 9.2 for these unary 
operators). 

We have already given an example of & as a bitwise binary operator. Also avail¬ 
able are OR (|), Exclusive-OR ( A ) and the unary bitwise NOT (~). In a binary bitwise 
operation, each bit of the integer object (not floating-point types) is affected by 
the corresponding bit of the integer operand. This latter right-hand operator can 
be a constant or variable. 

Bitwise logi c op erations are identical in action to their assembly-code cousins 
(e.g. see Table [T-H . Other bit-banging' operations are Shift Right (») and Shift 
Left («). As before only integer type objects are permitted. The following code: 

BCD_LOW = (packed_BCD & OxOF) + 'O'; 

BCD_HIGH = (packecLBCD » 4) + 'O'; 

separates the 8-bit object packecLBCD into its two 4-bit constituent BCD digits 
in their ASCII form. The low digit is obtained by clearing the upper four bits with 
a bitwise AND, whilst the high digit is separated out by shifting right four times. 
Adding ASCII 0 (i.e. 0x30) converts to the appropriate ASCII code. Parentheses 
are used, as & and » are of lower priority than +. Table IB31 shows how these 
operations translate to 68000 code. 

The Shift Left operation always feeds in zeros. Shifting right is more problem¬ 
atical. If the object is unsigned, then a Logic Shift Right is generated, with zeros 
moving in. The situation is confused when a signed object is being acted upon. 
Most compilers will emit an Arithmetic Shift Right, where the sign bit is propa¬ 
gated along. However, this is not guaranteed. If a Logic Shift Right is desired, 
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then the object can temporarily be treated as unsigned; for example: 

|_ Temporary cast 

z = (unsigned int)a » 6; 

where I have assumed a is a si gned i nt type. The unary operator (type) used 
to force the variable a is known as a cast. 

C has a range of relational and logic operations which treat objects as Booleans, 
that is having only two values, True (non-zero) and F alse ( zero). We have already 
used the Greater Than (>) operator in line 5 of Table [8721 Here the value of n is 
compared to 0. If Greater Than, then the outcome of the expression n > 0 is 1 
(i.e. True); otherwise the outcome is 0 (False). Actually in this case the construc¬ 
tion while (n) would do the same thing. Unary logic NOT (!) simply changes 
the truth value of the object; for example: 

while ((!n && m) || (n && !m) ) /* while this is true */ 

{do this, that and the other} /* loop body */ 

executes the loop body if n is False (! n True) AND m is True OR ELSE n is True 
AND m is False. In other words, only if one of m or n is False (i.e. 0) will the loop 
body be executed. Notice the use of && and | | for logic AND and OR, as opposed 
to the bitwise & and | operator symbols. 

All logic (Boolean) expressions are guaranteed to be evaluated left to right, and 
this evaluation ceases as soon as an overall result can be ascertained. Thus in the 
example above, if n were False and m were True, the sub-expression (n && ! m) 
would not be executed. Thus fancy programming such as: 

(!n && m)|| (n && !m++) 

would be dangerous as the m++ increment would only happen if n was True and/or 
m was False. In this case, the first expression would be False and the compiler 
would move onto the second expression. 

Mixing up the logic equivalent operator == and assignment operator = is a 
major source of error fl"5l (not helped by most texts calling == equal). Compare 
the following two statements: 

if (a == b) {do this;} /* Correct */ 

if (a = b) {do this;} /* Dangerous */ 

In the former case the value of a is compared to that of b. If they are the same, 
(True) the value of the expression is 1, and {thi s; } is executed. If they differ, 
the result is 0 (False) and {thi s; } is skipped (see page l224l Neither a nor b are 
changed by this process. In the latter case a is assigned the value of b, and the 
value of the expression is b. If b is non-zero then {thi s; } is done, and if zero, 
skipped. It is unlikely the programmer meant to do this, and if he/she did, then 
it should be done in a less obscure fashion. 

As a final example, consider the problem of determining the state of the 
most significant bit of an unsigned int object x. This simply requires AND- 
ing by 2 n ~ 1 , where n is the number of bits in the object. Unfortunately an i nt 
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object can have 16 bits in some implementations and 32 bits in others (other val¬ 
ues are also possible but rare). If the software is to be written in a portable form, 
then one of the two masks 2 15 (1 0000000b) and 2 31 (1 000000000000000b) has 
to be chosen. 

C has a unary operator called si zeof, which operates on a type designator or 
object, and which returns its size in bytes. This also applies to composite objects 
such as arrays and structures. Using this, a possible sequence might be: 


if (sizeof(x) == 4) 

/* Has x got 4 bytes? 

*/ 

{mask = 0x80000000;} 

/* If True 

*/ 

el se 



{mask = 0x8000;} 

/* If not True 

*/ 


msb = mask & x; 

Notice the use of == to compare the size of x with 4. 

A rather more ingenious coding is given by: 

msb = ((sizeof(x)==2) * 0x8000) + ((sizeof(x)==4) * 0x80000000)) & x; 

where we rely on a Boolean expression returning 0 if False and 1 if True. 

Where a variable is to be assigned to one of two values depending on the 
truth of an expression, C provides a compact ternary operation using the ?: pair. 
Repeating the above now gives us: 

msb = (sizeof(x)==2 ? 0x8000 : 0x80000000) & x; 

where the expression in parentheses evaluates to 0x8000 if si zeof (x)==2 eval¬ 
uates to True, else 0x80000000 if False. Try rewriting the statement using the 
NOT Equivalent (! =) operator. 

Besides Addition and Subtraction, the basic arithmetic operations of Multipli¬ 
cation, Division and Modulus (%) are provided. Division of two integral objects 
yields a truncated integral quotient, thus 6/4 = 1. The Modulus operation of 
two integral objects gives the remainder, thus 6%4 = 2. Truncation direction 
and modulus sign are implementation dependent with negative objects. Modu¬ 
lus only operates on integral objects. 

As an example, consider the following code which converts an 8-bit binary 
variable to a hundreds, tens and units BCD digit: 


unsigned char binary, hunds, tens, units; 
units = binary%10; /* e.g. 253%10 = 3 (units) */ 

binary = binary/10; /* Residue = 25 after this */ 

tens = binary%10; /* %10 gives 5 (tens) */ 

hunds = binary/10; /* /10 gives 2 (hundreds) */ 


In Section 9.2, we repeat this example for larger binary numbers, using an array 
data structure. 

Consider the statement above: 
binary = binary/10; 

In C this can be written in a compressed manner as: 
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binary /= 10; 

using the /= compound assignment function. This could be read as divide bi nary 
by 10. 

Apart from compound assignments' concise notation, there can be advantages 
in the size of machine code emitted where complex objects are involved. As an 
example, consider a 2-dimensional byte array (see Section 9.2) of 100 rows and 
12 columns. If, say, we wish to multiply an element 5 rows down and 3 columns 
across, we could write: 

x[5] [2] = x[5] [2] * n; 

using simple assignment. The compiler knows where the start address of the 
array is, so to get x[5] [2] it must multiply the number of rows (5) by the max¬ 
imum number of columns (i.e. 12). Finally add the actual number of columns 
(2 across). This is the number of bytes on from the start (62), see Fig. l9.31 b). and 
would then be used as part of some Indexed address mode to give the effective 
address (ea). Once x[5] [2] was down, it would be multiplied by n. The compiler 
would then move to the left side of the assignment, and if not very bright would 
again calculate the ea (probably previously thrown away) to determine the target 
address for the Store/Move. This takes lots of wasted time and code. 

The alternative compound assignment is written: 

x[5] [2] *= 2; 

The compiler now knows that the ea has only to be calculated once, which con¬ 
sequently produces a superior coding. 

Using this notation, line 6 of Table [8721 could be replaced by: 

sum += n; 

or even, using the comma operator (,), lines 5 and 6 could be combined as: 
while (sum += n--, n > 0) {;} 

The comma operator, shown at the bottom of Table l8~4l allows expressions 
to be concatenated. Each such expression is guaranteed to be evaluated from left 
sub-expression to right, with the value being that of the rightmost sub-expression. 
Thus, in the example above, sum += n-- will be executed and then the test n > 0. 
The value (True or False) of this latter is the one acted upon by the while instruc¬ 
tion. Notice the use of {;} to indicate a null statement (i.e. do nothing). The 
braces are optional. It is normally recommended that the comma operator be 
used with caution. 

A close scrutiny of the code produced in Table 1831 shows that the three objects 
packed_BCD, BCD_L0W and BCD_HIGH are stored in memory as bytes (at [A6] -1, 
[A6] -2 and [A6] -3 respectively), as expected by their declaration as char. How¬ 
ever, when brought down into a MPU register, they are converted into 32-bit i nts. 
For example: 

M0VEQ.L #0,D6 ; Clears all 32 bits 

M0VE.B -1(A6),D6 ; packecLBCD occupies lower 8 bits 
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shows the promotion of the unsigned char packed_BCD to 32-bit status by mak¬ 
ing the upper 24 bits zero. If packed_BCD had been signed, then a Sign Extension 
would have been used (e.g. EXT for the 68000 MPU). This promotion to i nt is the 
reason why an Arithmetic Shift Right (ASR) was used to implement » in line C5, 
as opposed to the expected LSR, as i nt is signed and the compiler sensibly uses 
Arithmetic Shift operations for signed numbers. 

In general, C prefers to do all its fixed point arithmetic in i nt form. Thus, 
as shown by the thick arrow in Fig. |8.4l all objects declared si gned or unsi gned 
char, signed or unsigned short are automatically made int for the duration 
of their stay in the processor. Some compilers give the option of disabling this 
widening, which can be useful for 8-bit MPUs which have difficulty in this area. 
However, this extension facility is non-standard. In a similar manner, C prefers to 
do its floating-point operations in doubl e f 1 oat form. This too may sometimes 
be changed to the non-standard single-precision float size, to save time and 
storage. 

C permits arithmetic with mixed types. Consider the following example: 


short z; 
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i nt x; 

unsigned long y; 
float a; 
a = x + y/z; 

What type will the right-hand side end up with, and how will that equate with the 
left-hand type? 

Well, firstly object z will be promoted to unsi gned 1 ong to match the numer¬ 
ator, and the result will be unsigned long. Then x will be promoted to unsigned 
long to match, and added to give an unsigned long right-hand value. Finally 
t hi s is converted to f 1 oat, which is the value assigned to the left-hand variable. 

In general, in a mix ed type operation, the objects involved migrate upwards 
to the highest commonalty, as defined in the hierarchy of Fig. 18.41 with i nt being 
the base integral type. 

One point that needs watching is the notion that an unsigned integral type is of 
a higher order than its signed counterpart. This is because an unsigned quantity 
can hold a larger magnitude for the same size, see Fig. 18.31 This can cause strange 
outcomes when mixing unsigned and signed types together. For example, in the 
statement above, if x was -1 on a 2's complement machine, it would be stored 
as OxFFFFFFFF (for 32 bits). Now because of y, it must be converted to unsigned 
long, and in this case it will be treated as a positive number (4,294,967,295). In 
some situations, this can lead to spectacular results, although it will work out 
correctly in this case. In general, if possible do not mix signed and unsigned 
numbers. 

In an assignment, the right-hand value (r_value) is converted to the Lvalue 
type, in this example, the f 1 oat equivalent to the unsi gned 1 ong r_value. Where 
the Lvalue type is further down the hierarchy, then truncation or other unspeci¬ 
fied shortening will occur, and unless the actual value can be fitted into the lower 
type, an erroneous result will be recorded. 

As a final example of what can go wrong consider the code fragment: 

long int sum; /* Reserve 32 bits for sum */ 

unsigned int n; /* and a 16-bit n */ 

sum = (n+l)*n/2; /* Sum of all integers up to n */ 

compiled with a 16-bit int and 32-bit long compiler model. All arithmetic is 
done at unsi gned i nt level (i.e. 16-bit precision). However, if n is large enough, 
overflow will occur; for example if n is 256, then (n+l)*n will give 256 and not 
65,792 (256 is 65, 792 - 65, 536)! The fact that sum is defined as long will not 
save the situation, as this means only that the final (erroneous) r_value will be 
promoted to 32 bits. If values of sum greater than 65,535 are expected, then 
the variable n may be treated as a 32-bit object by using the cast operator (i.e. 
(1 ong) n), which will force 32-bit arithmetic thus: 

sum = ((long)n+l) * n/2; 

Why didn't I bother to cast the second n? Why is the code of Table [7H3] safe? 
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Continue 


Continue {do that;} 


(a) Plain if 


(b) if-else 


Figure 8.5 Simple 2-way decisions. 

8.4 Program Flow Control 

The flow control instructions specify the structure of the computation process. 
Primarily they provide the means whereby the MPU can bypass, alternate or repeat 
a specified block of statements based on the outcome of an expression. In C this 
outcome is defined as False if the value returned is 0, otherwise it is True. 

The most fundamental decision structure is shown in Fig. 18.51 where the truth 
outcome of an expression used as the argument of the if instruction is used to 
decide between a 2-way branch. In (a), an expression returning True forces the 
execution of the do this; statement, otherwise nothing. The else instruction 
can be used in conjunction with i f, that is if-else, to force one statement on True 
and another on False. 

As an example of a straight i f decision, the following statement returns the 
positive equivalent (the modulus) of the variable x: 

if (x<0) {x= -x;} 

The statement following the if (expression) is executed when x<0 is True 
(i.e. x is negative), otherwise it is skipped. The braces surrounding the i f body 
are optional when it comprises a single statement. Compound statements must 
be braced as usual. As a matter of style, in this text braces are normally used 
irrespectively. 

An i f-el se construction is used in the following code snippet, which converts 
an ASCII-coded digit in the range' 0' to' 9' and' A' to' F' into its equivalent decimal 
value 0 to 15. 

if (ascii <= '9') 

{decimal = ascii - '0';} 

el se 

{decimal = ascii - 'O' - 7;} 
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Here the ASCII code for '0' (i.e. 30 h) is subtracted if the digit lies between '0' 
and '9' (30h-39h) and 3 7h is subtracted if it does not (which assumes that it 
must be between 'A' and ’F’, 41 h - 46h). 

i f instructions may be nested, although care must be taken in using braces 
to force the proper association. As an example, consider a Real-Time Clock func¬ 
tion entered via an interrupt once a second. We will discuss how this might be 
accomplished in a C program in Section 10.2. Once in the function, we have three 
variables: Seconds, Mi nutes and Hours. The logic for the update is: 

1. Add one to the Seconds count. 

2. If t hi s gives 60 then zero Seconds and increment Mi nutes. 

3. If this gives 60 then zero Mi nutes and increment Hours. 

4. If this gives 24 then zero Hours. 

As shown in Table [8TTT1 the Seconds variable is first incremented and then 
compared for greater than 59 in line 4 (note the ++ operator before the variable 
Seconds). If this is not True then the following complex statement, delineated 
by the braces of lines 5 and 15 is skipped and the function exits. Otherwise, 
this complex statement is entered, Seconds are zeroed in line 6 and the next 
i f instruction executed. This does the same thing with Mi nutes, and if the result 
is not greater than 59 its body, delineated by braces in lines 8 and 14, is skipped 
and the function terminated. Finally the third-level nested i f increments and 
checks Hou rs. If the result is not greater than 23, then its body is skipped to the 
brace in line 13 and thence the exit point at line 16. 


Table 8.6 A nested i f Real-Time Clock interrupt service routine. 
1: unsigned char Seconds,Minutes,Hours; 

2: void clock(void) 

3: { 

4: if(++Seconds>59) 

5: { 

6: Seconds=0; 

7: i f(++Minutes>59) 

8 : { 

9: Minutes=0; 

10: if(++Hours>23) 

11 : { 

12: Hours=0; 

13: } 

14: } 

15: } 

16: return; 

17: } 


Notice how the i f instructions are indented, and how the different nesting 
levels' braces line up. It is essential to take care with constructions like this to 
avoid error. 

Nesting i fs with el ses can cause errors, as any el se will associate itself to 
the nearest unattached i f, thus: 
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1: if (n > 0) /* IF n is above zero TFIEN */ 

2: if (n > max) {n = max;} /* restrict to no more than max */ 

3: else n = 0; /* Otherwise ensure it never goes negative */ 

The writer of this code fragment meant to restrict the variable n to the range 
0 - max, limiting it to these boundary values if beyond. Thus the logic was: 

1. Check n above zero, IF False then make n = 0. 

2. Check n above max, IF True make n = max, ELSE do nothing. 

What actually happens is the el se of line 3 will attach itself to the i f of line 2, 
not that of line 1; thus if n is lower than zero, all of lines 2 and 3 are bypassed. 
Furthermore, if n is not above max, then n will be made zero! The situation is 
solved by proper use of braces: 

if (n > 0) 

{ 

if (n > max) {n = max;} 

} 

else n = 0; 

or better still use the else-if instruction: 

if (n > 0) {n = 0;} 

else if (n > max) {n = max;} 

Although nested i f s can be utilized to make multiple decisions, their use is 
not very elegant, and, as we have seen, error prone. The else-if construction 
illustrated in Fig. 18.61 is the more structured approach. The several expressions 
are evaluated in order until the first True result. The statement associated with 
t hi s expression is executed, and the rest of the chain is by-passed. An optional 
final else can be used at the end to give a default action. 

As an example, let us redo the Real-Time Clock function, this time using an 
el se-i f construction. In Table [8/71 line 4 is a plain i f, which checks the state 
of Seconds after incrementing. Should Seconds be less than 60 the dummy 
null statement {;} is executed and the rest of the structure bypassed. If not, 
then the Mi nutes variable must in turn be incremented and checked. However, 
first Seconds must be zeroed. I have used the comma concatenate operator to 


1 

2 

3 

4 

5 

6 

7 

8 
9 


Table 8.7 An else-if Real-Time Clock interrupt service routine. 
unsigned char Seconds, Mi nutes , Flours; 
void clock(void) 

{ 


if(++Seconds<60) {;} 

else if (Seconds=0,++Minutes<60) {;} 
else if (Minutes=0, ++Flours <60) {;} 

else {Flours=0;} 

return; 


} 
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Continue 


if (expression 1) 

{do this?} 

else if (expression 2) 
{do that;} 

else if (expression 3) 
{do the other;} 

else 

{another do;} 


Figure 8.6 Using el se-i f to make a multi-way decision. 


implement a compound expression doing both in line 5. As far as the el se-i f 
operator is concerned, the rightmost expression value is utilized in determining 
its action. Similarly the Hours variable is checked in line 6. A plain else gives 
the final fall-through option which happens only once a day, when going from 
23:59:59 to 00:00:00 hours. 

One final el se-i f example evaluates the factorial of an unsigned object val¬ 
ued between 0 and 12. If outside this range a zero is returned to indicate an 
error situation. Table [5781 is self-explanatory. Remember that as soon as a True 
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outcome is met, the associated expression is evaluated and the whole of the rest 
of the structure bypassed. As we shall see, there are more efficient ways to im¬ 
plement this function. 


1: 

2 : 

3: 

4: 

5 : 

6 : 

7: 

8 : 

9: 

10 

11 

12 

13 

14 

15 

16 

17 

18 


Table 8.8 Generating factorials using the el se-i f construct. 
unsigned long factor(int n) 

{ 

unsigned long factorial; 
if((n==0)I I(n==l)) 
else if (n==2) 

(n==3) 

(n==4) 

(n==5) 

(n==6) 

(n==7) 

(n==8) 

(n==9) 

(n==10) 

(n==ll) 

(n==12) 


if 

if 

if 

if 

if 

if 

if 

if 

if 

if 


el se 
el se 
el se 
el se 
el se 
el se 
el se 
el se 
el se 
el se 
el se 

return(factorial) 
} 


{factorial=1;} 

{factorial=2;} 

{factorial=6;} 
{factorial =24;} 
{factorial=120;} 
{factorial=720;} 
{factorial=5040;} 
{factorial=40320;} 
{factorial=362880;} 
{factorial=3628800;} 
{factorial=39916800;} 
{factorial=479001600;} 
{factorial=0;} 


/* Error condition */ 


The switch-case instruction gives an alternative multi-way decision structure, 
as shown in Fig. 18.71 This time a single expression, which must return an integral 
type result, is evaluated at the head of the structure. A series of case expres¬ 
sions compare this result with a constant. On finding equality, the accompanying 
statement is executed. If no equality is found, an optional default statement is 
allowed. 

The switch-case structure is much less flexible than its el se-i f multi¬ 
way cousin. Only one expression is evaluated, and actions are not mutually 
exclusive. We see from Fig. |8.7| that if, say, the expression produced a value 
equal to constant B, then not only is the do that; statement executed, but also 
do the other; and the default other do; as well. Normally the swi tch-case 
decision tree is implemented at machine level by using the result of the expres¬ 
sion as a pointer into a look-up table holding a series of Jump to Subroutines, 
that is the case routines. As these are stored consecutively in memory, once a 
statement is entered, it will be executed and on return will immediately jump 
to the next subroutine. Thus it will fall through all following case routines until 
the terminating Return from Subroutine instruction is reached. As compensa¬ 
tion for swi tch-case's lack of flexibility, the resulting machine code is normally 
more compact, and execution is quicker than an el se-i f equivalent. The code 
of Table 18751 yielded 264 bytes on a 68000 MPU compiler, while that of Table f8T9l 
took 212 bytes. Larger structures produce proportionately greater savings. 

Two things are noticeable concerning Table [8791 Firstly case statements can 
be stacked, as in line 6 where the outcome is the same if n is 0 or 1. Secondly 
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switch 


{expression returning any integer type){ 



Continue 


Figure 8.7 switch-case multi-way decision. 

each case statement is compound, ending with the b reak instruction. This forces 
the execution to bypass all remaining statements down to the return of line 20. 
Leaving out break is not a syntax error, but is rarely what the programmer meant 
to do fl~6l . 

swi tch-case structures are frequently used in conjunction with a keyboard 
to select an appropriate response to each keypress, usually by jumping to a sub¬ 
routine. Thus, if key M is pressed, do a memory examine; if V is pressed, view a 
block; etc. 
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Table 8.9 Generating factorials using the swi tch-case construct. 
unsigned long factor(int n) 

{ 

unsigned long factorial; 
switch(n) 

{ 


6: 

case 0 

case 1: 

{factori al=l; 

break;} 

7: 

case 2 


{factor!al=2; 

break;} 

8: 

case 3 


{factor! al=6; 

break;} 

9: 

case 4 


{factor! al=24; 

break;} 

10 

case 5 


{factor! al=120; 

break;} 

11 

case 6 


{factor! al=720; 

break;} 

12 

case 7 


{factor! al = 5040; 

break;} 

13 

case 8 


{factor!al=40320; 

break;} 

14 

case 9 


{factor!al=362880; 

break;} 

15 

case 10: 

{factor!al=3628800; 

break;} 

16 

case 11: 

{factor!al=39916800; 

break;} 

17 

case 12: 

{factor!al=479001600; 

break;} 

18 

default: 

{factor! al=0;} 


19 

} 




20 

return(factor!al) ; 



21 

} 





The loop structure is the standard technique for repeating a process a number 
of times, either on a single object or on an array or block of related objects. 
We have already extensively used this approach at assembly level, for example 
Table [5/71 C has three statements specifically handling loops: while, do-while 
and for. 

Initially, let us see how we could handle a loop without using specific looping 
instructions. Consider the following code fragment, which evaluates the factorial 
by repetitive multiplication of a decrementing n. 

factorial = 1; 

LOOP: if (n>l) {factorial *= n--; goto LOOP;} 

This uses the goto instruction, together with a label, to repeat the i f test on each 
pass of the loop body, in a similar fashion to an assembly language implementa¬ 
tion (see Table l4.13V 

The goto instruction can be used to force an unconditional branch to a label 
anywhere within a function. However, its use is frowned upon (it has been stated 
that the quality of programmers is a decreasing function of the density of goto 
instructions in the programs they produce E2), as used without care it can lead 
to spaghetti (unstructured) code. Nevertheless, its use is sometimes virtually 
indispensable, particularly when trying to escape to the outside world from within 
nested loops. Use with caution. 

We have already met the w hi le loop back in Table l8Jl Here the body — lines 
11 to 14 — was repetitively executed as long as the test n>0 was True. On False 
the code following the body, that is line 15, is entered. 

Three elements present in any loop should be noted. Firstly, variables must be 
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set to their initial state before the loop proper is entered, see line 9 in Table [87T1 
Then, in a whi 1 e construct, a test is made, as shown in Fig. l8.81 a). If the outcome 
is True, the loop body is executed. Finally, some change must be made to the test 
variables, so that this test will eventually have a False outcome, and execution will 
go on to the next code section. Sometimes this change is explicit, as in line 13 of 
Table l8Jl and sometimes implicit in the loop body. 


Table 8.10 Generating factorials using a while loop. 
unsigned long factor(int n) 

{ 

unsigned long factorial; 
factorial=1; 
whi 1 e(n>l) 

{ 

if(n>12) {factorial=0; break;} 
factorial *= n--; 

} 

return(factorial); 

} 


Two keywords are used in conjunction with while. A break forces an im¬ 
mediate exit from within the loop, usually on some exceptional situation. In 
Table |8T0l this occurs if n>12, which is the error condition demanding a return 
of zero. This is done by testing for greater than 12 and breaking if True in line 7. 
In the case of a nested loop, breaking will move the execution only to the next 
outer level. 

The continue keyword forces an early repeat of the test by jumping over the 
rest of the loop body. As an example, consider an array of signed elements (see 
Section 9.2). The following code totalizes only array members that are positive: 

sum =0, x = 0; 

—►- while (x < MAX) - 

} 

if(array[x] < 0) {continue;} - 

sum += array[x++]; 

- } 

Sometimes it is necessary to go through the loop body first before testing for 
exit. This ensures that at least one pass will be performed irrespective of the 
outcome of the test. The structure of this do-while (repeat-until) loop is shown 
in Fig. 18.81 b). A break can be used in a similar way as in whi 1 e, and conti nue 
causes a drop down to the test (rather than upwards). The do-whi 1 e loop is the 
least used of the three kinds of C loop constructions. 

The most versatile of the three is the for loop. This is similar to while but 
combines the initialization, test and loop variable update as its arguments; thus: 

forfexprl; expr2; expr3) 

{loop body} 
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causes expression 1 to be evaluated once at the beginning. Expression 2 is tested 
next, and while True enters the loop body. The third expression is evaluated after 
each loop iteration. Thus normally, but not exclusively, expression 1 is used to 
initialize variables, expression 2 for the while test and expression 3 to change the 
tested expression. The whi 1 e equivalent is: 

exprl; 

whi1e(expr2) 

{ 

loop body 
expr3; 

} 

As an example, lines 9 - 14 of Table [8Tl can be replaced by: 


for 



(sum =0; n > 0; n—) {sum += n;} 


initialization 


while this is True do body 
update loop variable 


the loop 


See also Table 110.141 

Expressions can be compounded using the concatenate , operator; so several 
variables can be initialized together, and fairly complex processing can be done 
directly in expression 3. The body of the loop must be enclosed by braces, unless 
it is a single statement, and there must be a body; for example: 

forfsum = 0; n>0; sum += n--) {;} 

has a null statement as its body (braces are optional). 

Any of the three expressions may be omitted, but the semicolons must stay. 
An omitted expr2 always returns a True result giving an endless loop. This 
may be deliberate, for instance traffic lights must be controlled continuously 
irrespective. In this case we could have the structure: 

mai n () 

{ 

for (;;) /* Forever do */ 

{Control traffic lights} 

} 

Similarly whi 1 e (1) can be used to delineate a continuous loop, as the expression 
is always True. See line 7 of Table l9.Sl b). 

break as usual causes an immediate exit from the loop, as in line C6 of Ta¬ 
ble [8TT] continue passes control to expr3, which usually updates the loop 
variable before returning to the test, see Fig. 18.81 c). 
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Table 8.11 Generating factorials using a for loop. 
unsigned long factor(int n) 

unsigned long factorial; 
for(factorial=l; n>l; n--) 

{ 

if(n>12) {factorial =0; break;} 
factorial*=n; 

} 

return(factorial); 

} 


(a) Source code. 

* 1 unsigned long factor(int n) 

2 { 

.text 
. even 

_factor: 




link 

a6,#-4 

* 

3 

unsigned long factorial 

* 

4 

for(factorial=l; n>l; n 



moveq 

.1 #1,dO 



move. 

1 dO, (-4, a6) 

LI: 


cmpi . 

1 #1,(8,a6) 



bl e. s 

Lll 

* 

5 


{ 

* 

6 


if(n>12) 



cmpi . 

1 #12,(8,a6) 



bl e. s 

L14 



clr.l 

(-4,a6) 



bra. s 

Lll 

* 

7 


factorial*=n; 

L14: 


move. 

1 (-4,a6),d7 



mul u. 

1 (8,a6),d7 



move. 

1 d7,(-4,a6) 

* 

8 


} 



subq. 

1 #1,(8,a6) 



bra. s 

LI 

* 

9 

return(factorial); 

Lll: 


move. 

1 (-4,a6),d7 



uni k 

a6 



rts 




. globl 

_factor 


10 

} 



* Open frame 


* factorial = -1 (at A6-4:-3:-2:-1) 

* n (at A6+8:9:A:B) > 1? 

* IF not THEN terminate 

{factorial=0; break;} 

* n > 12? 

* IF yes THEN go on 

* ELSE return 0 as an error marker 

* and break 

* Get factorial 

* Long (32x32) multiply by n 

* and return it 

* n-- 

* Repeat 

* Return factorial in D7.L 

* Close frame 

* and terminate function 


(b) Resulting 68020 MPU assembler code (64 bytes) with annotated comments. 
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CHAPTER 9 


More Naked C 


Here we discuss functions as the building block of C programs, data structures 
of various kinds, libraries and headers. In keeping with our definition of naked 
C in Chapter 8, we continue to studiously ignore hosted input/output and file 
handing operations. 


9.1 Functions 

The function in C is the direct equivalent to the subroutine at assembly level, 
and is directly translated as such. A function encapsulates an idea or algorithm 
into a named structure. It can be used in an expression as a normal variable, 
by naming it together with any parameters that are being passed. All functions, 
except void, return a value as defined by the return instruction. This is the 
value that is substituted for the functio n in th e calling expression. For example, 
if we have the function defined in Table [87111 then the code fragment: 

x = 4; 

y = l/factor(x); 

will make y the reciprocal of 4!, that is The value of factor (4) is of course 24, 
as returned in line 9 of that table. 

In this section, we specifically need to look at how functions are declared and 
defined, how parameters are passed back and forth, and the scope of objects 
declared inside and outside functions. 

C programs are structured as a collection of external objects. These objects 
are mainly global variables and functions. This is graphically shown in a much 
simplified form in Fig. 18.11 The main function, conventionally called main(), 
acts as a central spine calling up the various ancillary functions in the appropri¬ 
ate order, usually with a minimum of processing itself. In a hosted environment, 
mai n () interacts with the operating system, from which it can obtain and some¬ 
times return information. In a naked enviro nm ent it is normally entered via an 
assembly-level startup routine, and frequently runs forever in an endless loop. 
More details are given in Section 10.1. 

Although main() is regarded as a little special in C, in reality th e co mpiler 
treats it in the same fashion as any other function. The layout of Fig. 19.11 shows 
this, with mai n() being one of three functions in the figure. Each of these func¬ 
tions must be defined. A function definition, typical examples of which are 


236 




FUNCTIONS 23 7 



main function definition 



function 1 definition 



function 2 definition 



Figure 9.1 Layout of C programs. 
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shown in Tables l8Tl and [8f)l - l8. Ill consists of a prototype heading followed by 
local variable definitions and any called function declarations, and then by the 
body of the function. This body is a series of any legal C statements enclosed 
in braces. Unusually, these braces must be present even if the body comprises a 
single statement; for example: 

{ 

return (x*x); 

} 

is a legitimate function body (squaring x, which has been passed to the function). 
The return instruction is the mechanism whereby the function is assigned a 
value, as seen by the caller. If retu rn is omitted, the function will still exit back 
to the caller (i.e. there will be an RTS or equivalent at the end of the subroutine), 
but the value seen by the caller will be undefined. Functions which do not return 
a value, that is are voi d (e.g. Table |8.6[ , can either omit this statement or as a 
matter of style include a null return ; . Parentheses are optional around retu r n's 
expression, but are frequently used for clarity. 

The function prototype at the head of the body simply names and indicates 
the type of the function (i.e. its return value) and the types of any parameters 
passed to the function. Doing this, we have for our squaring function definition: 

int squarefsigned char x) /* Prototype head */ 

{ return (x*x); } /* Body */ 

The prototype declares the name of the function as square (), returning an 
i nt value and accepting a si gned char variable, here named x. In the body of 
the function, the formal parameter x behaves as an auto si gned char variable. 
It is local to this function, that is it is unknown outside. Note that in this case 
we can return an i nt, as we know that x will be promoted to i nt type for the 
calculation, see Fig. 18.41 If the type of the expression returned is not that indicated 
in the prototype, it will be converted using the normal C rules. 

At the assembly level, return is normally implemented by evaluating the ex¬ 
pression and putting it in a register prior to RTS. Thus in the line labelled Lll: of 
Table [8m factorial is placed in Data register_D7. In line Lll: of Table l7.14l 
sum is returned in Accumulator_D. The AX register is normally used for 80x86 fam¬ 
ily returns (see Table [TO. 141 b)) Registers may be concatenated for return types 
larger than a single register capacity, for example X:D in 6809 implementations 
for long or float returns. 

Unlike some languages, such as Pascal, function definitions are not allowed 
inside other functions; that is, each definition must be self-standing, as shown 
in Fig. 19.II and Table lifTI Any function can of course be called up from any other 
function (or even from itself for recursive operations). When a function is going 
to be used, it needs to be declared in a si mi lar manner to any other object. 

As our example for this section, consider a function that will return the integral 
power of a signed integral variable, that is y exp . This is shorthand for a repetitive 
multiplication of 1 by y, exp times, which covers the case where exp = 0. Thus, 
for example, 2 3 = 1 x 2x2x2. 
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The software implementation of Table [9711 a) uses this algorithm, but recog¬ 
nizes that overflow will occur for certain combinations of y and exp, and returns 0 
for this situation. This is deter min ed when the result of the kth multiplication is 


Table ftTTl The C program as a collection of functions (continued next page). 


mai n() 

{ 

signed char n; /* Define variable n */ 

unsigned char x; /* Define variable x */ 

register int p; /* Define variable p */ 

int powerfsigned char y, unsigned char exp); /* declare power() */ 

n=25; x=3; 

p=power(n,x); /* p = 25A{3} */ 

} 

/* Here follows the definition of power() */ 

int powerfsigned char y, unsigned char exp) /* Generates yA{exp} */ 

{ 

int result, old_result, abs(int); 
for(result=l; exp>0; exp--) 

{ 

old_result = result; 


result*=y; /* Repetitive multiplication by y */ 

if(absfresult)<=abs(old_result)) {return 0;} /* Overflow error */ 


} 

return result; 
} 


/* 

s 

Here follows the 

definition of abs() 

V 

int 

abs(i nt 

z) 




{return 

(z>=0 ? 

N 

1 

N 

'-Y-' 


(a) C source code. 



* 

1 

main() 



* 

2 


{ 


1 


. text 



2 


. even 



3 

_mai n: 

link 

a6,#-2 


4 


movem.1 

d5/d0,-(sp) 


* 

3 

signed char n; 

/* Define variable n */ 

* 

4 

unsigned char x; 

/* Define variable x */ 

* 

5 

register int p; 

/* Define variable p */ 

* 

6 

int 

powerfsigned char y 

unsigned char exp); /’-'declare power() */ 

* 

7 

n=25; x=3; 


5 


move.b 

#25,C-l,a6) 


6 


move.b 

#3,(-2,a6) 


* 

8 

p=power(n,x); 

/* P = 25A{3} */ 

7 


moveq.1 

#0, d5 


8 


move.b 

C-2,a6) ,d5 

Copy x from [A6]-2, extended to int 

9 


move.1 

d5,(sp) 

Put on Stack 

10 


move.b 

C-l,a6) ,d5 

Copy n from [A6J-1 

11 


extb.1 

d5 

Sign extended to int 

12 


move.1 

d5,-(sp) 

and push out on Stack 

13 


jsr 

_powe r 

Co do ftn power(n,x), returning in D7.L 

14 


addq. 1 

#4, sp 

Restore SP from last push 

15 


move.1 

d7,d5 

p lives in D5.L (register variable) 

16 


movem.1 

(sp)+,d5/d0 

Move it to D7.L 

17 


uni k 

a6 

and return 

18 


rts 



* 

9 

} 



* 

10 
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Table 9.1 (continued! The C program as a collection of functions. 


* 

11 

/* Here 

follows the definition of powerO 

V 

* 

12 





* 

13 

int power(signed char y, unsigned char exp) /* Generates yA{exp} 

*/ 

* 

14 


{ 



19 


. even 




20 

power 

link 

a6,#-16 



* 

15 

int 

result, old_result, abs(int); 


* 

16 

for(result=l; exp>0; exp 

-) 


21 


moveq.1 

#1, dO 



22 


move.1 

dO,(-4,a6) 

result (living in [A6]-4) = 1 


23 

LI: 

tst.b 

(15,a6) 

exp (living in [A6]+15) > 0? 


24 


beq. s 

Lll 

Break if True 


* 

17 


{ 



* 

18 


old result = result; 



25 


move.1 

(-4, a6),(-8,a6) 

old_result at [A6]-8 = result 


* 

19 


result*=y; 

/* Repetitive multiplication by y 

V 

26 


move.1 

(-4,a6),d7 

Get result 


27 


move.b 

(11,a6),d6 

and y as passed by caller in [A6]+ll 


28 


extb.1 

d6 

68020 style sign extension byte to int 


29 


mulu. 1 

d6,d7 

68020 type 32x32 multiplication 


30 


move.1 

d7,(-4,a6) 

put away as new result 


* 

20 

if(abs(result)<=abs(old_result)) {return 0;}/* Overflow error 

V 

31 


move.1 

(-8,a6),(sp) 

Copy old_result and put into stack 


32 


jsr 

abs 

Get absolute value back in D7.L 


33 


move.1 

d7,(-12,a6) 

Put away for safekeeping in [A6J-12 


34 


move.1 

(-4,a6),(sp) 

Repeat for result 


35 


jsr 

abs 



36 


cmp. 1 

(-12,a6),d7 

Compare abs(result) with abs(old_result) 

37 


bgt. s 

L12 

IF greater THEN continue 


38 


moveq.1 

#0, d7 

ELSE return a zero to indicate error 


39 


uni k 

a6 



40 


rts 




41 

L12: 

subq.b 

#1,(15,a6) 

Decrement exp char 


42 


bra. s 

LI 

and repeat for loop 


* 

21 

} 




* 

22 

return 

result; 

(-4,a6),d7 



43 

Lll: 

move.1 

Exit with power in D7.L 


44 


uni k 

a6 



45 


rts 





* 

23 

} 



* 

24 




* 

25 

/* Here 

follows the definition 

of abs() 

* 

26 



* 

27 

int abs (int z) 


* 

28 

{return (z>=0 ? z:-z);} 


46 


. even 



47 

_abs: 

link 

a6,#0 


48 


tst.l 

(8,a6) 

; Test z passed in [A6]+8 

49 


blt.s 

L01 

; IF less, then prepare to negate 

50 


move. 1 

(8,a6),d7 

; ELSE exit with z unchanged 

51 


bra. s 

L21 


52 

L01: 

move. 1 

(8,a6),d7 

; Get z 

53 


neg. 1 

d7 

; negate it 

54 

L21: 

uni k 

a6 


55 


rts 


; Return with abs(z) in D7.L 

56 


.globi 

_mai n 


57 


.globi 

_abs 


58 


.globi 

_power 



(b) Resulting 68020 MPU assembly code (line numbers added for clarity). 
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less than the (k - 1)th. However, as y can be signed, it is the modulus of these 
results that must be compared, rather than their actual value. 

The program is structured as three functions. The obligatory mai n () is for 
demonstration only, and terminates with the value of 25 3 . Of interest to us is the 
declaration in line C6 of the function power (). This declaration is in prototype 
form, where the function return type (i nt here) is followed by its name and the 
type of the two objects to be passed (a signed char and unsigned char). For 
clarity the formal parameters y and exp are used in the declaration. This is 
optional, and the declaration: 

int powerfsigned char, unsigned char); 

is acceptable. 

This line is a declaration, unlike lines C3 - 5 where variables are defined. The 
difference is that a definition gives the properties of the referenced object as 
well as reserving storage. On the other hand a declaration only makes known the 
object's properties; no storage space is assigned. 

A declaration function prototype is not mandatory, and the alternative: 

int power(); 

is accepted by the compiler without complaint. However, without a prototype 
declaration, the compiler will not be able to check that the writer has sent the 
correct number and types of variables to the function. This was an endless source 
of error in old C, where prototyping was not featured. Indeed out of deference to 
old C, the compiler will accept no function declaration at all, and will assume an 
i nt type return as default. Not recommended! I have given mai n() an old style 
name-only header rather than the prototype void main (void), as a function 
not returning any type and not receiving any parameters. Table [8761 is another 
example of the use of voi d. 

Function power () is defined in lines C12 - 23 in the normal way, with a proto¬ 
type head and body in braces. The formal parameters y and exp are known only 
down to the closing brace and so their names can be reused by any other func¬ 
tion, although for clarity it is better not to reuse parameter names. The actual 
(as opposed to formal) parameters sent are of course n and x. 

Function power () makes use of function abs() twice, so this is declared as 
an object known to it in line Ci5, together with other variable definitions. The 
definition of abs() is in lines C27 and C28 (see the ?: operator on page [220). 

Any number of parameters may be passed to a function, although only one 
can be returned (a function can have only one value at a time!). Parameters are 
passed by value (i.e. copied) in a manner which is implementation dependent, 
but normally on the System stack, as illustrated in Fig. IS.41 and described in Sec¬ 
tion 5.2. The mechanism can clearly be seen by inspecting the assembly code of 
Table l9Hf b). Each function has its own private frame, in which its local (i.e. auto) 
variables are stored. For mai n () this frame is two bytes deep (line 3) and holds x 
at the bottom and n at the top. To send the value of x to power () it is first copied 
into D5 and put on the System stack (lines 7-9) and the process repeated for n 
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(lines 10-12). Finally a J SR is made to _powe r : in the normal way to transfer to 
the subroutine. 

Several implementation-specific properties of this compiler (Whitesmiths V3.2 
68020 C cross-compiler) cloud this issue. Firstly, all functions identified as 
name() commence at _name: at assembly level. This compiler passes chars (and 
shorts) promoted to ints, although they are treated according to their proper 
type in the function (lines 27 and 41). This is probably to cope with old-style non¬ 
prototype declarations, where parameters are promoted to int or double m. 
Finally, and rather obscurely, the compiler always puts the first variable passed 
(the rightmost) away using the plain Address Register Indirect address mode, 
whereas further variables (moving leftward) use the Address Register Indirect 
with Pre-Decrement mode for a proper Push action. Thus we have: 

9 MOVE.L D5,(SP) ; A straight move to [A7] 

12 MOVE.L D5,-(SP) ; A proper Push to -[A7] 

Why is this? Well, the former is quicker, especially as the System stack does 
not have to be restored to its original value (cleaned up) on return (e.g. line 14) 
to compensate for its decrementation. This is useful, as the majority of function 
calls only pass a single parameter, but of course whatever was on the System 
stack before will be overwritten. This compiler gets around this either by having 
created a frame which is overly large (see Fig. 19.21 1 or else when registers were 
saved on the System stack after the frame was opened (line 4) a sacrificial register 
was put away. In either case the stack content is irrelevant at the time when 
the parameters are sent out, and can be overwritten. This compiler specifies 
that registers D3, D4, D5, A3, A4, A5, A6 are not to be altered on return from a 
function. Thus, as mai n() uses D5, it is saved in line 4 together with DO, whose 
value on return is unspecified; that is the sacrificial register. 

Compiler-specific details like these are irrelevant at the higher level. However, 
they are important when mixi ng C and assembly-level subroutines, as described 
in Section 10.1, and when debugging, see Chapter 15. 

Some compilers pass one or more variables in a register. For example the Cos¬ 
mic/Intermetrics V3.3 6809 C cross-compiler will normally put the first copied 
value in Accumulator_D if char, short or i nt, and pass copies of any subsequent 
variables through the System stack in the normal way. In either case the objects 
are always known only to the called function, living either in the local frame or 
register set. Fig. 19.2 1 illustrates the frame as seen by power(). Notice how the 
passed variables are referenced above the local Frame Pointer A6, that is y at 
A6' +11 and exp at A6' +15, whereas the internal variables are below at A6' -8 for 
resul t and A6' -12 for ol cLresul t. Notice the sacrificial long word at the bot¬ 
tom of the stack, which is overwritten in lines 31 and 34 by the single parameter 
passed to abs(). 

The concept of scope as the lifetime of an object was introduced in Section 8.2. 
Let us look at this in more detail. Objects defined inside braces are local to that 
multiple statement only. Thus, in the code fragment: 
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even byte odd byte 


A6’+ 1 2 


A6’ + 8 


A6’ + 4 


A6’ 


A6’ —4 


A6’-8 


A6’— 1 2 


SP 

after line 20 


A6’-1 6 


1 

0000 x/exp 

, A6’ + 1 5 

1 

0000 0000 

1 

1 

sign n/y 

, A6’ + 1 1 

1 

sign sign 

i 

i 

PCL 

| 

1 

PCH 

1 

1 

old A6 L 

1 

1 

old A6 H 

I 

1 

result L 

| 

1 

result H 

I 

1 

old_result L 

| 

1 

old_result H 

| 

1 

temporary L 

| 

1 

temporary H 

| 

1 

not used 

| 

1 

not used 

_1_ 


\ 


Line 9 


> x expanded to int (= exp) 


< 


Line 12 


> n expanded to int (= y) 


< 


Line 13 


>JSR power 


< 


Line 20 


> LINK A6 o #-16 


Frame Pointer A6’ 


Figure 9.2 The System stack as seen from within power (), lines 21 -38. 
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functionO 

{ 

i nt i ; 

{ 

do lots of things with i; 

} 

{ 

inti; 

do more things with a different i; 

} 

etc 

} 

The two is are different, and the lower i is known only down to its local }. 
Outside this redeclared area the first i is known. 

Generally, variables are declared at the opening function brace and disappear 
from view at the closing brace. When the function is re-entered, stati c vari¬ 
ables (placed in absolute memory) will have kept their last value, whereas auto 
variables (assigned space in a stack-based frame) have lost theirs. 

Unless a function is declared stati c 0, its identifier is broadcast as an exter¬ 
nal object; that is it is declared public or glo bal a t assembly level, and as such is 
known to the linker. In lines 56 - 58 of Table [971T b). the three labels _mai n, _abs 
and _power are declared . GLOBL (similar to . PUBLIC), as would be expected. This 
means that any other file which has been separately compiled for future linking 
can use, say, function power (), as the assembly line I SR _P0WER will be recog¬ 
nized by the linker (see Section 7.2). However, the declaration: 

extern int powerfsigned char y, unsigned char exp); 

must appear in this separate file either before the first function (in which case its 
scope is the entire file) or else in functions which call it. This tells the compiler 
that the function powe r () will be found elsewhere through the linker. The extern 
qualifier in C will generate a . EXTERNAL or XREF directive at assembly level, for 
example: 

.external _power 

Variables can also be made publicly known. If an object is defined outside 
a function, then it is globally recognized. For example, the variables Seconds, 
Mi nutes and Hours in Table [8761 are not only known to cl ockO but to any func¬ 
tion defined afterwards. Usually global variables are define d at the head of the 
file, and so are known to mai n() and everything else. Table [TTTsT b) shows the re¬ 
sulting assembly code for an external object (Array [] in line Cl). By convention, 
such variables are identified by a leading capital letter. Where they are used by 
separately compiled files through the linker, they must be announced by using 
the keyword extern, for example: 

extern unsigned char Seconds, Minutes, Hours; 
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Such a statement is considered a declaration — as no storage is granted — not a 
definition. 

Like stati c variables, global (known as extern) variables are allocated abso¬ 
lute memory locations (see Table libSl b)). They can be initialized where they are 
defined and behave in the same way; that is, the initial values are given at load 
time. If no initialization value is given, C specifies an implicit zero value. 

Rather confusingly, a variable declared outside a function can be qualified by 
the stati c keyword. If this is done, the variable is known from that point on 
only wit hi n the file in which it appears. It will not be passed through the symbol 
table to the linker as a global object, stati c externally defined objects are stored 
and initialized in the same way as ordinary (i.e. across files) defined objects. 

In review, it is important to distinguish between a stati c variable declared 
inside and one declared outside a function. They are both stored in absolute 
memory (i.e. not in a frame) and are both initialized in the assembler by using 
data storage directives (e.g. . BYTE, . DS etc). The former is only locally known 
within its function. The latter is known throughout its file (if declared at the 
top) but not beyond it. Leaving out the stati c qualifier on an externally defined 
variable broadcasts its name to all files through the linker. However, to use such 
a variable in an outside file its name and properties must be declared in such 
outside files qualified by the extern keyword. 

Functions too are globally known from their declaration onwards and to ex¬ 
ternal files. However, qualifying a function definition by the keyword stati c 
restricts its scope to its local file (?). Thus replacing line C13 in Table FTTH :) v: 

static int power(signed char y, unsigned char exp) 

will not generate assembler line 58, that is the identifier _power is not broadcast 
as public. 


9.2 Arrays and Pointers 

As far as C is concerned, an array is a set of objects of the same type. Although it 
does provide the specific array operator [], no special array-oriented procedures 
are supported. 

Arrays must be defined in the same way as all other C objects; some examples 
are: 

static unsigned long arr[1024]; 
auto int fred[256]; 

const static unsigned char table_7[10]; 

The first defines an array named arr[], comprising 1024 consecutive long- 
words in absolute memory. At assembly level the reservation will be labelled 
something like _arr: . doubl e [1024]. Thus arr is actually the address of the 
first element of the array (e.g. see Table 1931 b). _Array:). 
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The second definition reserves 256 units in the frame. Although these loca¬ 
tions are in relative memory, the root name f red in the C source still refers to 
the address of element fred[0]. 

The final statement defines ten consecutive bytes of constants, beginning at 
_tabl e_7:, which are in absolute memory, probably destined for ROM. In prac¬ 
tice, this is useless as it stands as an initial value must be given as part of its 
definition, otherwise it will be filled with zeros by the compiler — in the normal 
way for stati c objects — which will generate an assembly-level line something 
like: 

_table_7: .byte 0,0,0,0,0,0,0,0,0,0 

As the array is const, no subsequent change can be made to any element. 

The size of an array must be determinable by the compiler, which in prac¬ 
tice means the use of a constant d im ension specifier or its equivalent during its 
definition. Incidentally the si zeof operator works with arrays, and indeed any 
C object. Thus si zeof (f red) will yield 1000 for a 4-byte i nt implementation. 

An array can hold any type of object, and can have any of their attributes. 
However, it is unlikely that the compiler will pay any attention to a register 
qualifier. Each element will have the characteristics expected of it according to 
its type, including initialization properties. Thus an auto array is initialized at 
run time on each entry to its local sphere of influence, whilst stati c and global 
arrays are set up once and for all at load time, otherwise are zero. 

Some definitions of initialized arrays are: 

int factor[5] = {1,1,2,6,24}; 

static unsigned const char square[] = {0,1,4,9,16,25,36,49,64,81,100}; 

In the latter case the dimension of the array was not given, the compiler tak¬ 
ing it as the number of initializers (i.e. eleven); and array elements square [0] 
to square[10] will have the values shown. The dimension n, specified either 
explicitly or implicitly in a definition, is the number of elements. However, as 
element 0 is the first, the final element is n - 1. Thus the first example really 
means factor[0] = 0, factor[1] = 1, factor[2] = 4, factor[3] = 9, and 
factor [4] = 16. There is no factor [5]! The compiler will not warn you of er¬ 
rors like this, and using undefined elements is a fruitful source of obscure errors. 
If the explicit array size parameter is greater than the number of initializers, all 
unspecified elements are assumed to be zero, unless an auto storage-class array. 
Old C did not permit auto array initialization, but this is not true of ANSII C. 

At any time an element m of an array can be referred to by following the root 
name by [m]. The resulting object can then be treated like any other C object 
of the same type. As an example, consider the following code fragment which 
applies the 3-point low-pass filter transformation, defined on page 11101 to an 
array of 256 elements: 

for (i = 255, i > 1, i--) 

{array[i] = array[i]/2 + array[i-l]/4 + array[i-2]/2;} 
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Table 9.2 Generating factorials using a look-up table. 
unsigned long factor(int n) 

{ 

static unsigned const long array[13] = 

{1,1,2,6,24,120,720,5040,40320,362880,368800,39916800,479001600}; 
if(n>12) {return 0;} 
return(array[n]); 

} 

(a) C source code. 



. text 




. even 



L5_array 

: .long 

1 

array[0] 


.long 

1 

array[l] etc. 


.long 

2 



.long 

6 



.long 

24 



.long 

120 



.long 

720 



.long 

5040 



.long 

40320 



.long 

362880 



.long 

368800 



.long 

39916800 



.long 

479001600 

array[12] 

* 1 u 

nsigned long factorfint n) 

* 2 

{ 




. even 



_factor: 

link 

a6,#-4 


* 3 

static 

unsigned const 

long array[13] = 


{1,1,2 

6,24,120,720,5040,40320,362880,368800,39916800,479001600} 

* 4 

if(n>12) {return 0;} 


cmpi . 1 

#12,8(a6) 

Compare n living in [A6]+8 with 12 


bl e. s 

LI 

IF lower or equal THEN continue 


moveq. 1 

#0, d7 

ELSE exit with 0 in D7.L 


uni k 

a6 



rts 



* 5 

return(array[n]); 


LI: 

move.1 

8(a6),d7 

Get n 


asl . 1 

#2,d7 

Multiply by 4 to match array element size 


move.1 

d7,al 

and put in Al.l 


adda.1 

#L5_array,al 

Add it to the address of array[0] 


move.1 

(al),d7 

which then points to array[n]. Get it 


uni k 

a6 

and return 


rts 
. globi 

_factor 


* 6 

} 




(b) Resulting 68000 assembly code. 


Thus array[255] = ^array[255] + ^array[254] + ^array[253] etc. 

A look-up table is a synonym for an array, usually, but not always, of con¬ 
stants. Table [9721 shows this technique used to generate our old friend, the facto- 
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rial. Here an array of 13 elements hold the equivalents of 0! to 12!. The array, or 
table, is declared in line C3 as stati c (i.e. stored in absolute memory) and const, 
together with the appropriate values. The const qualifier forces the compiler to 
place these values in the text program section along with the program, both of 
which will be in ROM in an embedded implementation. This storage is simply 
thirteen sequential long-words beginning at L5_array:, in the same manner as 
in Fig. I9.3l b). 

The resulting assembly-level program itself uses the array index n multiplied 
by four (to match the element size) to give the offset from the base address. 
Putting this in an Address register (Al) and adding the base address L5_array 
to it gives the position of array [n]. This is then transferred to D7.L for return. 

Multi-dimensional arrays can be implemented in C, although their use is rare. 
Some example definitions are: 

unsigned char calendar[12][31]; /* 12 months of 31 days each */ 

int x[100][12]; /* 100 rows of 12 columns */ 

char x[3] [2] = {7,6},{9,13},{0,5}; /* 3 rows of 2 columns 

x[0][0]=7, x[0][1]=6, x[1][0]=9, x[l][1]=13, x[2][0]=0, x[2][l]=5 */ 

Higher-order arrays are treated as arrays of array objects. In this ma nn er 
cal endar [12] [31] can be considered as 12 objects (months), each of which 
comprises an array of 31 days holding unsi gned char objects. Noting from Ta¬ 
ble 18.41 that [] associates left to right, the 2-dimensional array x[3] [2] can be 
written as (x [3]) [2] , that is an array called x of three objects, each of which con¬ 
tain two elements. Fig. l9.31 b) shows the memory organization of a 2-dimensional 
array. As can be seen, the rightmost subscript varies fastest as we go up in mem¬ 
ory, with one complete pass for an increment of the next left index. In accessing 
any element [m] [n], the relationship (m x COL x w) + n must be calculated, where 
m is the row, COL is the number of columns, w is the element size in bytes and n is 
the column co-ordinate. Adding this offset to the base address of array[0] [0] 
gives the element address. The principle applies for any number of dimensions. 

We have seen in Section 9.1 that variables are normally passed to functions 
by copying their value into a stack prior to the call. In general it is feasible to 
copy an entire array through a stack, but this technique is not used in C because 
of the overheads in time and space. For example, the array x[100] [12] would 
need 1200 Copy and Push actions prior to the call, which is hardly efficient. 
Instead the address of the base element is passed through the stack. With access 
to this pointer, the function can reference any element as described. Careful 
consideration of this shows that passing the address permits the function to 
change the actual array elements and not copies, as is the case where simple 
objects are involved. 

As an example of a function acting on arrays of data, consider a block copy 
operation where a number of elements from [0] to [1 ength-1] are to be copied 
from one array to another. The calling program must pass three parameters: the 
base addresses of arrayl and array2, and length. As the base address of an 
array is just its root name, the calling routine will include lines similar to these: 
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ar 

ar+1 

ar+2 

ar+3 

ar+4 

ar+5 

ar [0] 

ar[l] 

ar [2 ] 

ar [3] 

ar [4] 

ar [5] 


(a) A 1-dimensional array ar[n] of six elements. 


ar[0][0] 

ar+l 

ar[0][1] 

ar+2 

ar[0] [2] 

ar+3 

ar[0] [3] 

ar+4 

ar [0] [4] 

ar+5 

ar[0][5] 

ar+6 

ar[1][0] 

ar+7 

ar[1][1] 

ar+8 

ar[1] [2] 

ar+9 

ar[1] [3] 

ar+10 

ar [1][4] 

ar+l 1 

ar[1][5] 

ar+12 

ar[2][0] 

ar+13 

ar[2][1] 

ar+14 

ar[2] [2] 

ar+15 

ar[2] [3] 

ar+16 

ar[2][4] 

ar+17 

ar [2] [5] 

ar + 18 

ar[3][0] 

ar+19 

ar[3][1] 

ar+20 

ar[3] [2] 

ar+21 

ar[3] [3] 

ar+22 

ar[3][4] 

ar+23 

ar[3][5] 


Address of ar[p] [q] 

= ar+(p*6+q)*w 
where w = element size 


bytes. 


(b) A 2-dimensional array ar[m] [n] with 4 rows of 6 columns, 24 elements. 


callerQ 


Figure 9.3 Array storage in memory. 


long n; /* Length parameter 

char block_x[256], block_y[256]; /* Two arrays of 256 bytes 


void block_copy(char arrayl[], char array2[], long length); 
/* Declaration prototype of function block_copy() 


block_copy(block_x, block_y, n); /* Call up function block_copy() 

This would result in the first n elements of array block_y[] being physically 
replaced by the first n elements of array bl ock_x []. 

The code itself, reproduced in Table liOT b). shows that the address of ar ray2 []' 
base was passed in a long-word 12-15 bytes above the Frame Pointer (A6) and 
that of arrayl[] in locations 8-11 bytes above. Parameter length is passed 
by copy in [A6]+16/19. Array element i's address is calculated as i (stored in 
D5.L) plus the relevant base address. This is a pity, as the sequential nature of the 
process would suit the use of the Address Register Indirect with Post-Increment 
mode to creep up (walk) through the block, such as in lines 23 and 25 of Ta- 
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Table 9.3 Altering an array with a function. 
void block_copy(char arrayl[],char array2[],1ong length) 

{ 

register long i; 
for(i=1ength-1;i>=0;i 

{array2[i] = arrayl[i];} 
return; 

} 


(a) C source code. 


_bl ock_ 


LI: 


void block_copy(char arrayl[],char array2[],long length) 

{ 

. text 
. even 
copy: 

link a6,#0 

move.l d5,-(sp) 

register 1ong i; 
for(i=length-l;i>=0;i - -) 
move.l 16(a6),d5 ; i (in D5.L) 

subq.l #l,d5 ; minus one 

tst.l d5 ; i>=0? 

blt.s Lll : IF not THEN 


Lll: 


equated to length passed in [A61+16 


pass on 


{array2[i] = arrayl[i];} 


movea.1 
adda.1 
movea.1 
adda.1 
move.b 
subq. 1 
bra. s 
move.1 
uni k 
rts 

. globl 
return 
} 


d 5, al 
12(a6),al 
d 5, a2 
8(a6),a2 
(a2),(al) 
#1, d5 
LI 

(sp)+,d5 

a6 

_block_copy 


i to Al.L 

added to array2's base address passed in [A61+12 
i to A2.L 

added to arrayl's base address passed in [A6]+8 
arrayl[i] moved to array2[i] 
i -- 

and repeat 


(b) Resulting 68000 code. 


ble I S.7 1 However, the code size of 40 bytes compares well with that of 39 in the 
hand-assembled version. 

Notice how an array is denoted in the function prototype by the root name and 
empty square brackets, for instance arraylf]. A size is not necessary, although 
it can be added for clarity if desired. Multi-dimensional arrays must give the 
size of each dimension, except the leftmost. As we have seen, this is in order to 
calculate the address of any element. Care must be taken, as the compiler does 
not check for overrun; thus reference to array[9] in an array defined to have 
four elements is accepted, and the contents of memory location array + 9 x w 
actually fetched or, worse still, changed! 
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What if we wanted to copy the contents of a ROM into a RAM, as was the case in 
Table [577^ The function will be exactly the same, but this time we must pass the 
start address of the two chips. We have seen that we can determine the address 
of an array by just referring to its root name; can we extend the principle? The 
affirmative answer to this leads us to one of C's strengths, the use of pointers. 

A pointer is a constant or variable object holding information relating to where 
another object is stored. In MPUs with linear addressing techniques, such as most 
8-bit devices and the 68000 family, this is just the absolute address. In segmented 
architectures, as exhibited by the 8086 family, typically near and far pointers 
exist; the former holding the address within the current segment (usually two 
bytes) and the latter the segment:address (usually four bytes). 

Pointers may be taken of any C object, except a regi ster type, by using the 
address-of unary operator & (see Table 18.41 . For example, if a variable x exists, 
then its address can be assigned to the pointer variable ptr thus: 


ptr = &x; 


where pt r has previously been suitably defined. 

Conversely, if we have a pointer, then we can get the object it points to by 
using the indirection unary operator contents-of (*). Thus if we have a pointer 
ptr then: 

y = *ptr; 

which reads from left to right, y is assigned the contents of address ptr. 

We now have the problem of what value will a pointer have if the pointed-to 
object is bigger than one byte, say a 4-byte long-word variable or 100-word array? 
And how would the construction ptr+1 be interpreted? In assembly/machine 
code, the address is normally the lowest byte address of the object, for example: 

MOVE.B DO,OCOOOh ; [C000] -> D0(7:0) 

MOVE.L DO,OCOOOh ; [C000:C001:C002:C003] -> D0(31:0) 

and C uses the same convention. Thus, the value of a pointer to the 100 short- 
element array (&ar[0j) stored in memory at OxCOOO — 0xC063 will simply be 
OxCOOO. In the case of an array, we can use the root name as the base address; 
hence: 

ptr = &ar[0]; 
ptr = ar; 

are the same, and ptr will be a pointer to whatever kind of object the array 
comprises, provided of course that it has been previously defined as such. 

Although the storage size of the base address is fixed, and is independent 
of the object, pointers do take on the type of their referred-to object. Thus, for 
example, we can have pointer-to-i nt and pointer-to-f 1 oat entities. This, and 
other properties, are bequeathed to the pointer variable during its definition. In 
common with any other object, all pointer variables must be defined before use. 
Some examples are: 
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char * port; /* port is a pointer variable to a char object */ 

float * pvarl, * pvar2; /* pvarl & pvar2 are pointers to float objects */ 

void * point; /* point is a generic pointer */ 

In the first instance, the pointer variable port is brought into being and de¬ 
clared to be addressing a char object. This can be read as ' the contents of port 
is a char’. Alternatively, the * indirection operator can be transcribed as pointer 
to', if read right to left; that is ' port is pointer-to a char object'. The second ex¬ 
ample creates two pointer-to float objects, namely pvarl and pvar2. This reads 
(from right to left) pvarl is a pointer-to a float, pvar2 is a pointer-to a float. 

The final definition is rather enigmatic, as it appears to be saying that poi nt is 
a pointer-to nothing! A pointer to void type is treated as a generic pointer (pure 
address) and can be cast to any real type and back without any loss of integrity. 

The concept of different types of pointers is important when dealing with 
pointer arithmetic. Pointers can be incremented/decremented, added/subtract¬ 
ed with constants and pointers of the same kind and compared with pointers of 
the same kind. Consider a pointer pvar having a value at some instant of OxCOOO, 
then: 

pvar += 4; 

will have what value? If pvar is a pointer-to char (or voi d) then 0xC004 will be 
the answer, but if pvar is a pointer-to 1 ong (or f 1 oat) then OxCOl 0 is the answer. 
Thus in pointer arithmetic, constants indicate objects, and are multiplied by their 
sizes for the purposes of any calculation. 

A more sophisticated example of pointer arithmetic is given by the example: 

for (i=0; i<100; i++) {*(ar+i) =0;} 

/* Contents of object pointed to by ar+i is cleared; i = 0 to 99, step 1*/ 
which is similar to: 

for (i=0; i<100; i++) far[i] =0;} 

both of which clear an array of 100 elements. 

If the compiler is to make sense of the pointer operation ar + i , then it must 
know what type of object ar points to. In this case it will know from the prior 
definition of the array ar []. Thus, if this was 1 ong ar [100] , then ar + i would 
actually be calculated as ar + 4 x i . 

The root array name can be used as the parameter of the sizeof operator. 
Thus sizeof (ar) in the example above will return 400 as the storage used by 
the array. If the si zeof operand were just an ordinary pointer (i.e. not an array 
root), then the size of the pointer itself would be generated, typically 2 or 4 bytes. 

Pointer types can be to any object of any complexity. As an example consider 
our 2-dimensional array of chars, calendar[12] [31]. This has 12 arrays each 
comprising 31 char elements. We know that calendar is the address of the 
element calendar[0] [0] and that it is a pointer-to char type. Following this 
argument, what then is calendar [10]? Actually it is the address of November, 
that is where the 11th array of 31 days begins. What will the compiler make of the 
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statement calendar [10] + 1;? The result of the addition will be the address 
of December, that is 31 chars on, or calendar [11]. Thus, calendar [10] is a 
pointer-to array-of-31-chars type! If a pointer variable is to be assigned to such 
a type, then it must be defined accordingly, for example: 

char (* month)[31]; /* Declare month as pointer-to-array-of 31 chars */ 

month = calendar[0] + i; 

The complex definition of the pointer month reads from inside the parentheses 
going left and then right: month is a pointer-to / an array of 31 / chars. Paren¬ 
theses must be used, as [] is of higher precedence than *. Pointer variable month 
can then participate in pointer arithmetic where the other pointer variables are 
of the same type. 

The pointer variable itself may be given properties, suc h as being const or 
stati c (i.e. stored in absolute memory locations, see Table UTsl) . for example: 

static unsigned char * const port; 

which says that port is a const pointer-to an unsi gned char object andis stored 
statically. Besides giving port its arithmetic properties, the compiler has been 
told to store it in absolute memory and flag any attempt to change it (typically 
placed in ROM in an embedded system). 

In dealing with the interface to hardware, the software designer must be able 
to direct data to and from ports at known addresses. Thus in C, we must be able 
to assign specific addresses to pointers. In ANSIIC a pointer can only be assigned 
a value if their types match; thus the statement: 

char * port = 0x9000; /* !!!!!!!!!!! Incorrect */ 

in an attempt to make port point to 9000 h will fail, as 0x9000 is considered an 
i nt type constant (old C, however, permitted this cross-type assignment). The 
way around this problem is to use a cast to convert the i nt constant to a pointer- 
to type; thus: 


^ ^ _ Casts 0x9000 to a pointer-to-char type 

char * port = (char*)0x9000; /* Correct */ 

which now states that port is a pointer-to a char type variable, and its value is 
0x9000. The cast reads (right to left) pointer-to-char type. 

On this basis, if we wished to call up the function of Table [931 to copy a ROM 
starting from EOOOh to RAM starting at 2000 h of length 1 000h bytes, then the 
caller would include the following: 

ca]]() /* The caller function */ 


* const R0M_start = (char*)0xE000; /* RONLstart is a pointer */ 

* const RAM_start = (char*)0x2000; /* as is RAM_start */ 


char 

char 
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block_copy(ROM_start, RAM_start, 0x1000); /* Invoke the copy function */ 


or even 
callO 


block_copy((char*)0xE000, (char*)0x2000, 0x1000); 


but the former is more readable. 

We have already noted that * can be translated as contents-of (read from left 
to right), and can be used as such to access the contents of any object of which 
we have an address. Thus the statement: 

z = *port; /* Same as z = *(char*)0x9000; - i.e. contents of 9000h */ 

says that z is assigned the contents of the variable pointed to by port. As we 
have previously (page [753]) defined port as a pointer-to a char at 0x9000, then 
effectively we are assigning z to the byte stored in 9000. 

Consider the situation depicted in Fig. 19.41 where one of several byte (char) 
ports drives 7-segment displays. We need to write a function to interface this 
port, which will accept a number from 0 to 9, and the address of the destination 
port. Functions interfacing to hardware input/output are often known as drivers. 
Calling this driver function outchO (for output character), we would have as its 
declaration: 

int outch(unsigned char number, unsigned char * const port); 

This prototype identifies number as an unsi gned char (i.e. byte) and the address 
variable port as a const (fixed) pointer to an unsi gned char object. The func¬ 
tion is declared as returning i nt, as it is proposed to return -1 to indicate an 
error (defined as n > 9), otherwise 0. Based on this declaration, to display 7 at 
the digit located at 0x9000, we would use the call: 


| ^ ^ _ Casts 0x9000 to pointer-to-unsi gned char type 

outch(7, (unsigned char*)0x9000); 

Notice the cast (unsigned char*), which is necessary to match the constant 
address to the same pointer-to type as port. 

The program given in Table [93] is straightforward. The code conversion is 
done by means of a look-up table, as described in Table [9721 The extracted value 
is moved to port in line C5, by the simple expedient of saying that the contents 
of port are the nth entry of the table. Notice that in line C3 the table is an array 
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of statically stored constant characters (i.e. bytes), and has thus been assigned to 
the _text program section. In an embedded system this will be in ROM. Objects 
which are statically stored are considered to have been given their initial values 
at compile time (i.e. during loading), not run time. At assembly level this appears 
as . BYTE (or equivalent) directives. The const qualifier ensures that an attempt 
to modify the table values will be flagged as erroneous, which is sensible if the 
table is in ROM! Thus these initial values are the only values that the table will ever 
have. The compiler evaluates the size of the array from the number of initializers. 

The example of Table l9~4l shows that it is as easy to pass a pointer through to 
a function as a copy of an object. This can be exploited to change an object itself 
in a function, rather than the copy. Just send a reference to the object and not 
the copy (e.g. &x, not x). The contents of that object can then be changed at will; 
thus (rather trivially) to add ten to an i nt object x we have: 

void add_ten(int * pvar) /* pvar is a pointer to an int object */ 

{*pvar += 10;} /* Get inside it and increment by ten */ 
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Table 9.4 Sending out a digit to a 7-segment port. 
int outch(unsigned char number, unsigned char * const port) 
{ 

static const unsigned char table_7[] = 

{0x20,0x79,0x24,0x30,0x19,0x12,2,0x78,0,0x10}; 


if(number>9) 

{return -1;} 

/* Return error 

V 

*port = table. 

_7[number] ; 

/* or *port = *(table_7+number); 

V 

return 0; 


/* Return success 

V 


} 

(a) C source listing. 

; Compilateur C pour MC6809 (COSMIC-France) 

.processor m6809 
.psect _text 

L5_table_7: .byte 32,121,36,48,25,18,2,120,0,16 ; 7-segment values 

; 1 int outch(unsigned char number, unsigned char * const port) 

; 2 { 

_outch: pshs u ; Open frame (not needed!) 

leau ,s 


3 static const unsigned char table_7[] 

= {0x20,0x79,0x24,0x30,0x19,0x12,2,0x78,0,0x10} ; 


) 

4 

if(number>9) {return -I 

.;} /* Return error 

*/ 



1 dx 

4, u 

Get number passed in [U]+4 




cmpx 

#9 

>9? 




jble 

LI 

IF not TFIEN continue from LI: 




ldd 

#-l 

ELSE return with -1 in D 




jbr 

L4 

L4 is the exit point 


J 

5 

*port = 

table_7[number] ; 

/* or *port = *(table_7+number); 

V 

LI: 


1 dx 

#L5_table_7 

Point X to bottom of table 




ldd 

4, u 

Get number again 




ldb 

d, x 

Get word at number+table_7 bottom ([D]+[X]) 



stb 

[6, u] 

Store at address passed in [U]+6; i.e. 

port 

J 

6 

return 

0; 

/* Return success 

*/ 



cl ra 






cl rb 




L4: 


1 eas 

,u 

Close frame 




pul s 

u, pc 




7 } 

.public _outch 
. end 


(b) Resulting assembly code. 

which is called up as: 

add_ten(&x); 

The variable *pvar (i.e. x) can be manipulated in exactly the same way as any 
ordinary' variable. The Contents-Of operator *, in common with all other unary 
operators, has a high precedence and thus parentheses need rarely be used; for 
example: 

z = *(pvarl)/5 + *(pvar2)*7; 
z = *pvarl/5 + *pvar2*7; 
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are equivalent. However, care must be taken when the ++ and — unary Incre¬ 
ment and Decrement operators are used, as these have the same precedence. As 
unaries read from right to left, we have as an example: 

x = *pvar++; /* Increment pointer and take contents of */ 

x = (*pvar)++; /* Increment contents of pvar */ 

x = ++*pvar /* Same as above */ 

As we have already observed, incrementing a pointer is not the same as in¬ 
crementing a normal variable. Instead of one being added, a constant equal to 
the size of the object being pointed to is summed. Thus, incrementing a pointer 
to a long variable will usually add four. In a si mi lar manner, decrementation, 
addition and subtraction are scaled. Only pointers of the same kind can be com¬ 
pared, added or subtracted. As we have seen, this same-type rule also applies 
to assignments, including constants. Assignments and comparisons with voi d 
pointers are also permitted. 

There is some exception to the assignment rule, as a pointer can always be 
assigned to or compared with 0. ANSII C guarantees that no object ever lives 
at 0, thus a function returning a pointer can use this to indicate an error situation. 
Care needs to be taken in processors that can physically use address 0000/t, such 
as the 6809, to avoid storing any variable there. 

To declare a function returning a pointer, we use the declaration syntax: 

int * fredfparameter list); /* fred() returns a pointer to int */ 

Pointers to a function (i.e. where the function begins) are also possible (see also 
Section 10.1), thus: 

int (*fred)(parameter list); /*fred is a pointer to the function fred()*/ 

Hence, it is possible to store a table of pointers to functions (see page |281t . fre¬ 
quently seen in assembly-level programs as jump tables QQ. Pointers to pointers 
can be defined ad infinitum, if rarely used. 

We introduced pointers by noting that arrays were handled using addresses, 
especially when being passed to functions. Thus, by inference it is possible to 
use pointer rather than array notation in such functions. We did this as one way 
of clearing a 100-element array. Another example is given in lines C3 and C5 of 
Table fh4l which can be replaced by: 

unsigned const char * table_7 = {0x20, ., 0x10}; /* 7-segment table */ 

*port = *(table_7 + n); /* Send out nth entry */ 

Pointer notation is of course more comprehensive than array notation, and as 
such is more flexible. Most texts state that the use of the former will often lead 
to better code production by the compiler. However, it is the author's opinion 
that with modern compilers this is rarely so; thus whichever notation is clearest 
shoul d be used. For example, in the case of a memory copy, the prototype of 
Table [973] (line Cl) would be more obviously presented as: 

void block_copy(char * R0M_start, char * RAM_start, long length) 
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whereas array notation is more relevant to Table 19741 

As our final example, we will repeat the array scan and update procedure of 
Table 16.21 this time coded in C. Algorithm and hardware details are given on 
page 11521 Object identifiers have been kept the same to facilitate a comparison 
between the hand-assembled and C versions. 

The array of 256 short i nts is defined outside a function to give it global sta¬ 
tus, as can be seen by the .GLOBL _Array directive at the bottom of Table [931 b). 
Conventionally, such objects are identified by an initial capital letter. 

In the di spl ay () function, the objects dac_x and dac_y are declared as con¬ 
stant pointers to a char (byte) and short (word) respectively, and given the value 
0x6000 and 0x6002 respectively. An endless loop then sends out the values 
Array[x_coord] and x_coord to the Y and X digital to analog converters. We 
rely on x_coord being a byte-sized (unsi gned char) object and folding over af¬ 
ter 255 (FFh). 

The update() function is entered via an interrupt (see Table IT0T21 ) and resets 
the external interrupt flag (see Fig. 16.6) before reading the counter. Both counter 
and i nt_f'1 ag are declared constant pointers to their respective addresses/object 
types in lines C20 and C21. The variables 1 ast_ti me and update_i, respectively 
holding the counter reading from the last interrupt and the array index to put 
the new difference in, are defined as static, so that they can remember their 
previous value. Notice how they are assigned the fixed locations L5_last_time 
and L31_update_i in the .data program section. Although they will reside in 
RAM just before the 256 array words, they have not been declared global. 

Updating the array element simply means taking the contents of counter 
(see Fig. 16.1k that is -'counter, and subtracting the last reading. This difference 
(beat-to-beat variation) is assigned to Array [update_i ++] inline C23, which also 
increments the array element. As in x_coord, use is made of the byte nature of 
update_i to provide wraparound at 256. 

Finally, a comparison of the listings of Tables f6?2l and 1701 gives 99 bytes for 
the former and 152 for the latter. This excludes the data, vector table and the 
latter's strategy for dealing with interrupts. 


9.3 Structures 

We have seen that arrays are data structures grouping many objects having the 
same type under a single name. Many real situations require organizations of 
objects having many different properties, but coming under the same banner. As 
an example, consider a monitoring system in a hospital ward containing up to ten 
patients. Treating each patient (rather unfeelingly) as a composite object, then a 
record would contain data such as the hospital number, age, date and an array of 
physiological readings, such as heart rate, temperature, blood pressure etc. This 
would be continuously gathered, and perhaps once an hour transferred to a file 
on magnetic disk for later analysis. These ten objects could be defined in C as: 


struct mecLrecord 


/* Definition for mecLrecord structure */ 
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Table l9~5l Displaying and updating heart rate (continued next page). 


short Array[256]; /* Global array of 256 words */ 

/* The background routine is defined here */ 

void display(void) 

{ 

register unsigned char x_coord; /* The x co-ordinate */ 

char *const dac_x = (char*)0x6000; /* 8-bit X-axis D/A converter */ 

short *const dac_y = (short*)0x6002; /* 12-bit Y-axis D/A converter */ 

while(l) /* Do forever */ 

{ 

*dac_y = Array[x_coord]; /* Get array[x] to Y plates */ 


*dac_x = x_coord++; /* Send X co-ordinate to X plates */ 

} 

} 


/* The foreground interrupt service routine is defined here */ 

void update(void) 

{ 

static unsigned short last_time; /* The last counter reading */ 

static unsigned char update_i; /* The array update index */ 

short * const counter = (short*)0x9000; /* The counter is at 9000/lh */ 

char * const int_flag = (char*)0x9080; /* The external interrupt flag */ 

*int_flag = 0; /* Reset external interrupt flag */ 

Array[update_i++] = ’'-'counter-!ast_time; /* Difference is new array value */ 

last_time = ’’’counter; /* Last reading is updated */ 

} 


(a) C source code. 


1 short Array[256]; /* Global array of 256 words */ 

2 void display(void) 

3 { 


_display: 

* 4 

* 5 

* 6 

* 7 

* 8 

* 9 

LI: 


10 


. text 
. even 

link a6,#-10 ; Open frame 

register unsigned char x_coord; /* The x co-ordinate */ 

char *const dac_x = (char*)0x6000; /* 8-bit X-axis D/A converter */ 

move.l #6000h,-6(a6) ; Store address 6000h in [A6]-6 

short *const dac_y = (short*)0x6002;/* 12-bit Y-axis D/A converter*/ 
move.l #6002h,-10(a6) ; and address 6002h in [a6]+10 
while(l) /* Do forever */ 


* 

dac y = Array[x 

_coord] ; /* Get array[x] to Y 

plates */ 

movea.1 

-10(a6),al ; 

Point Al to 6002h 


moveq.1 

#0, d7 



move.b 

-I(a6),d7 ; 

Get x_coord in [A6]-l 


movea.1 

d7,a2 ; 

Put in A2 


adda.1 

a2,a2 ; 

Crafty way of multiplying by 2 for 

word array 

adda.1 

#_Array,a2 ; 

Add Array base address 


move.w 

(a2),(al) ; 

and move Array[x] to dac_y (6002h) 



*dac x = x 

_coord++; /* Send X co-ordinate to 

X plates */ 

movea.1 

-6(a6),al ; 

Point Al to dac_x (6000h) 

move.b 

-l(a6) ,d7 ; 

Get x_coord 


addq. b 

#1,-l(a6) ; 

Increment it 


and. 1 
move.b 

#255,d7 
d7,(al) ; 

} 

LI 

Send it out 


bra. s 




11 
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Table 9.5 (continued! Displaying and updating heartbeat. 

*13 

* 14 /* The foreground interrupt service routine */ 

* 15 

* 16 void update(void) 

* 17 { 

* 18 static unsigned short last_time; /* The last counter reading */ 

* 19 static unsigned char update_i; /* The array update index */ 

. data 
. even 

L3_last_time: .=.+2 ; Reserve 2 bytes for last_time 

L31_update_i: .=.+1 ; Reserve one byte for update_i 

* 20 short * const counter = (short*)0x9000; /* The counter is at 9000:1 */ 

. text 
. even 

_update: link a6,#-8 ; Make frame 

move.l #9000h,-4(a6) ; Use A6-4 to store address 9000h (counter) 

* 21 char * const int_flag = (char*)0x9080; /* The external interrupt flag*/ 

move.l #9080h,-8(a6) ; and A6-8 for 9080h (int_flag) 

* 22 *int_flag = 0; /* Reset external interrupt flag */ 

movea.l -8(a6),al ; Point A1 to int_flag 

clr.b (al) ; and reset it 

* 23 Array[update_i++] = *counter-last_time;/* Diff is new array value */ 

move.b L31_update_i,d7 ; Get update_i 

addq.b #1,L31_update_i ; Meanwhile inc it in memory for next time 

and.l #255,d7 ; Extend to {\tt int} (32 bits) 

movea.l d7,al ; Original value in byte form to Al 

adda.l al,al ; Multiplied by two for word array 

adda.l #_Array,al ; Add the array base; points to Array[update_i] 

movea.l -4(a6),a2 ; A2 points to counter 

move.w (a2),d7 ; Get it 

ext.l d7 ; in long form 

moveq.l #0,d6 

move.w L3_1ast_time,d6 ; Get last_time in long form 

sub.l d6,d7 ; Subtract them 

move.w d7,(al) ; Put away in Array[update_i] 

* 24 last_time = *counter; /* Last reading is updated */ 

movea.l -4(a6),al ; Point A6 to counter 

move.w (al),L3_1ast_time ; Get and put it away as new last_time 

unik a6 ; Close frame 

rts 

.globl _update 
.globl _di splay 
. data 
. even 

_Array: .=.+2 

.=.+510 ; Reserve 512 bytes (256 words) for Array[] 

.globl _array 

* 25 } 


(b) Resulting assembly code. 
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{ 

unsigned long hosp_numb; 
unsigned char age; 
unsigned long time; 
unsigned char day; 
unsigned char month; 
unsigned short year; 
unsigned short array[256]; 

} patient[10]; 

which defines an array of ten composite objects called pati ent [0] ...pati ent [9], 
having the structure defined by the tag med_record. Inside this structure are 
seven objects of various kinds, there can even be other structures. Structures 
in C are analagous to records in Pascal. 

Any member of a structure can be accessed within the scope of the definition 
by using the Dot (structure element) operator; for example: 

patient[6].month = 3; 

makes the object month inside the structure named pati ent [6] = three. 

The tag med_record is optional, and is the name of the structure template. 
Objects can be given this template any time later, for example dog_l and dog_2 
may be defined as: 

struct med_record dog_l, dog_2; 

Thus only a template (which does not cause storage to be allocated) can be de¬ 
clared, and definitions can occur at any following point, for example within other 
functions. 

Taking an example closer to the theme of this book, consider a compound 
peripheral interface such as the 6821 PIA QQ. We have already seen how this 
can be interfaced to a MPU in Figs 11.91 and 13.141 here we look at the internal 
register structure as described in Fig. 19.51 There are six programmer-accessible 
8-bit registers living in an address space of four bytes, as determined by the state 
of the Register Select bits RSO RSI. 

Sharing a slot are the Data Direction and Data I/O registers. Which of the pair 
is actually connected to the data bus when addressed is determined by the state 
of bit-2 of the associated Control register. Each of the eight I/O bits may be set 
to in or out, as defined by the corresponding bits in the Data Direction register; 
for instance if ddr_a is 00001 1 1 1 b, then Data register_A has its upper half set as 
input and lower half as output. Once a Direction register has been set up, then its 
slot can be changed to the I/O port, by setting the appropriate Control register's 
bit-2 high. 

Each of the six component parts of a PIA can be defined as a pointer, in the 
way described in the last section, and treated in the normal way. However, if 
there are several PIAs in the system, then a template for this device as a single 
compound object can be made and used for each physical port of this kind. 

Lines Cl - C9 of Tablc [9T)l a) declare a template describing the PIA as a struc¬ 
ture of pointers, thus each register is characterized as an address. Two PIAs are 
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defined based on this declaration, portO and portl in lines C12 and C13. Some 
of the registers are qualified as pointer-to vol ati 1 e, as bits read from the outside 
world will change independently of the software. I have declared these structures 
to be const and stored in absolute memory, that is stati c. This means that the 
structure elements, which here are constant addresses, will be stored in ROM 
along with the program (assembly lines 3 and 5 in Table FTTif b)) and any attempt 
to change these pointers will be flagged by the compiler as an error. Such struc¬ 
tures are initialized in the same way as a comparable array. Notice in lines C12 
and Cf 3 how the casts are the same as in the template definition. 

Functions can take structures as parameters and return them. In both cases 
the structure name alone is sufficient; for example in line Cl 5 the passed param¬ 
eter is portO, and this causes copies of all six elements to be pushed into the 
stack prior to the Jump (assembly lines 7- 10). 


7 0 


Data Direction register_A 

CRA2 

Data register_A 

CRA2 

7 2 C 


Control register_A 



\ 

Base address 
* <RS0 RSI = 00) 

Base address + 1 
(RSO RSI = 01) 


7 0 


Data Direction register_A 

CRB2 

Data register_B 

CRB2 

7 2 0 


Control register_B 



Base address + 2 
1 (RSO RSI = 10) 

Base address + 3 
(RSO RSI = 11) 


Figure 9.5 Register structure of a 6821 PIA. 
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Pass by copy is the same technique as used for ordinary single objects, and, 
as such, the elements themselves cannot be altered by the function. In the situ¬ 
ation depicted in Table [9ij] the structure elements are pointers, so although we 
cannot alter their copies in function i ni ti al i ze (), we can alter the pointed-to 
variable (i.e. PIA registers) through them. Thus, line C25 means that the contents 
of structure type PIA named port element control_a is assigned to zero, that 
is * port.control_a = 0;. As port is an element by element copy of struc¬ 
ture type PIA named portO (if called up from line Cl5), the contents of portO's 
control_a register are affected. 

Strangely, structure objects are passed by copy, whereas the equivalent pro¬ 
cess with arrays causes a pointer to the array to be passed. This latter is much 
more efficient, as only one object (the pointer) has to be pushed on to the stack 
prior to the Jump to Subroutine, irrespective of the size of the array. Structures 


Cl: 
C2: 
C3: 
C4: 
C5: 
C6: 
C7: 
C8: 
C9: 

CIO 

Cll 

C12 


C13: 


C14: 

C15: 
C16: 


Table f9~6l The PIA as a structure of pointers (continued next page). 


struct PIA 

{ 

unsigned volatile char *data_a; 

unsigned char *ddr_a; 

unsigned volatile char *control_a; 

unsigned volatile char *data_b; 

unsigned char *ddr_b; 

unsigned volatile char *control_b; 

} » 


/* Template for PIA 
/* I/O port A 

/* Data Direction register A 
/* Control register A 
/* I/O port B 

/* Data Direction register B 
/* Control register B 


V 

V 

V 

V 

V 

V 

V 


mai n() 

{ 

static const struct PIA portO = {(unsigned volatile char*)0x8000, 
(unsigned char*)0x8000, (unsigned volatile char*)0x8001, 
(unsigned volatile char )0x8002, (unsigned char*)0x8002, 
(unsigned volatile char*)0x8003}; 

static const struct PIA portl = {(unsigned volatile char*)0x8020, 
(unsigned char*)0x8020, (unsigned volatile char*)0x8021, 
(unsigned volatile char*)0x8022, (unsigned char*)0x8022, 
(unsigned volatile char*)0x8023}; 

void initialize(struct PIA);/* Declare a function accepting a structure*/ 

i ni tial ize(portO) ; 

initialize(portl); 


C17: /* Main body sends out of portl's B reg the sum of portO & portl's A reg*/ 


C18: *(portl.data_b) = *(port0.data_a) + *(portl.data_a); 

C19: } 

C20: /* Function sets up a PIA as a simple input A and output B port */ 


C21 

C22 

C23 

C24 

C25 

C26 

C27 

C28 

C29 


void initialize(struct PIA port) 

*(port.control_a) = 0; 

* (port. ddr_a) = 0; 

*(port.control_a) = 04; 

* (port. control_b) = 0; 

* (port. ddr_b) = OxFF; 

*(port.control_b) = 04; 


(a) C source code. 
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Table [9~6| The PIA as a structure of pointers (continued next page). 


1 

2 

3 

4 

5 


. text 
. even 

L5_port0: .long 32768,32768,32769,32770,32770,32771 ; Struct PIA portO 

. even 

L51_portl: .long 32800,32800,32801,32802,32802,32803 ; Struct PIA portl 


1 struct PIA /* Template for PIA */ 

2 { 

3 unsigned volatile char *data_a; /* 1/0 port A */ 

4 unsigned char *ddr_a; /* Data Direction register A*/ 

5 unsigned volatile char *control_a; /* Control register A */ 

6 unsigned volatile char *data_b; /* 1/0 port B */ 

7 unsigned char *ddr_b; /* Data Direction register B*/ 

8 unsigned volatile char *control_b; /* Control register B */ 

9 }; 


10 

11 

12 


13 


14 

15 


mai n() 
{ 


static const struct PIA portO = {(unsigned volatile char*)0x8000, 
(unsigned char*)0x8000, (unsigned volatile char*)0x8001, 
(unsigned volatile char*)0x8002, (unsigned char*)0x8002, 
(unsigned volatile char*)0x8003}; 
static const struct PIA portl = {(unsigned volatile char*)0x8020, 
(unsigned char*)0x8020, (unsigned volatile char*)0x8021, 
(unsigned volatile char*)0x8022, (unsigned char*)0x8022, 
(unsigned volatile char*)Ox8023}; 
void initialize(struct PIA); 
ini ti alize(portO); 

. even 


7: 

_mai n 

adda.1 

#-24,sp 

; Prepare to push 24 bytes 

8: 


move.1 

#L5_port0,-(sp) 

; i.e. the six pointers of portO 

9: 


move.1 

#24,dO 

; out onto the System stack 

10 


jsr 

a~pushstr 

; Using this library subroutine 

11 


jsr 

_i ni ti al i ze 


12 


1 ea 

24(sp) , sp 

; Restore the Stack Pointer 

* 

16 

i ni ti alize(portl); 


13 


adda.1 

#-24,sp 

; Repeat above to pass struct PIA 

14 


move.1 

#L51 portl,-(sp) 

; portl to initializeO 

15 


move.1 

#24,dO 


16 


jsr 

a~pushstr 


17 


jsr 

_i ni ti al i ze 


18 


1 ea 

24(sp) , sp 


* 

17 /'•'■ 

Main body sends out of portl' 

s B reg the sum of portO & l's A r 

* 

18 

*(portl.data_ 

_b) = * (portO. data 

_a) + *(portl.data_a); 

19 


movea.1 

L51_portl+12,al 

; Point Al to portl's data_b reg 

20 


movea.1 

L5 portO,a2 

; Point A2 to portO's data_a reg 

21 


moveq.1 

#0, d7 


22 


move.b 

(a2),d7 

; Get portO's data_a byte 

23 


movea.1 

L51 portl,a2 

; Point A2 to portl's data_a reg 

24 


moveq.1 

#0, d6 


25 


move.b 

(a2),d6 

; Get portl's data_a byte 

26 


add. 1 

d6,d7 

; Add them 

27 


move.b 

d7,(al) 

; and send to portl's data_b reg 

28 


rts 




V 


19 } 

20 /* Function sets up a PIA as a simple input A and output B port */ 
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Table 9.6 (continued,) The PIA as a structure of pointers. 

21 void initialize(struct PIA port) 

22 { 


29 


. even 



30 

initialize: link a6,#0 



* 

23 

*(port.control_a)=0; 



31 


movea.l 16(a6),al 

; Get control_a passed on stack 

at A6+16 

32 


clr.b (al) 

; and clear it 


* 

24 

*(port.ddr_a)=0; 



33 


movea.l 12(a6),al 

; Get ddr_a passed on stack at 

A6+12 

34 


clr.b (al) 

; and clear it 


* 

25 

*(port.control_a)=04; 



35 


movea.l 16(a6),al 

; Get control_a again 


36 


move.b #4,(al) 

; Make it 00000100b 


* 

26 

*(port.control_b)=0; 



37 


move.l 28(a6),al 

; get control_b passed on stack 

at A6+28 

38 


clr.b (al) 

; Clear it 


* 

27 

*(port.ddr_b)=0xFF; 



39 


movea.l 24(a6),al 

; Get ddr_b passed on stack at 

A6+24 

40 


move.b #-l,(al) 

; Make it FFh (i.e. -1) 


* 

28 

*(port.control_b)=04; 



41 


movea.l 28(a6),al 

; Get control_b again 


42 


move.b #4,(al) 

; Make it 00000100b 


43 


unlk a6 



44 


rts 



45 


.globl _mai n 



46 


.globl initialize 



47 


.globl a~pushstr 



* 

29 

} 




(b) Resulting assembler code. 

are generally smaller than arrays are likely to be, and presumably for this reason 
the less efficient pass by value copy technique is used. In Table [9761 b) this is done 
by moving the System Stack Pointer down 24 bytes (6 x 4-byte pointers), pushing 
the base address of the structure on to the System stack, the byte size in DO.L, 
and using the machine library subroutine (see Section 9.4) a~pushstr to do the 
moving, lines 7-12 and 13-18. 

Just like a simple object, a structure's address can be passed instead. This is 
the more efficient method of passing a structure to a function. Thus to pass the 
medical record of pati ent [6] to a function store(), which will store it on disk, 
we could use the calling statement: 

store(&patient[6]); 

where pati ent [6] is the name of the structure and &pati ent [6] its address. 

The sizeof operator will give the size of the whole structure. This may be 
greater than the total of the individual elements, as some machines enforce stor¬ 
age boundaries, which effectively pads out elements with holes. For example in 
the 68000 family, non-byte objects normally begin at even addresses. An exam¬ 
ple of this is shown in the use of the . EVEN assembler directive in lines 2 and 4 of 
Table [T6l The & operator (i.e. address-of) can also be used to generate a pointer 
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to any element in a structure; thus &pati ent [6] . hosp_number is the address of 
the unsigned long object hosp_number lurking inside structure patient [6], 
the latter being a composite object structured as declared by the med_record 
template. 

If we pass a pointer as a parameter to a function, then it must be declared 
in the function declaration and heading as being a pointer to a particular object; 
thus for the store() function we would have: 

void store(struct med_record * ptr) 

which says that ptr is a pointer to a structure type med_record, as passed to 
store (). Another example is seen in line C14 of Table [9771 where we are passing 
a pointer to a structure of type PIA (in line C21). 

Given that a function has received a pointer declared to be to a structure, how 
is it to get at the individual elements? Well, the contents of the pointer to a 
structure are deemed to be the same as the structure’s name. Thus in: 

x = (* ptr).hosp_number; 

x will take on the value of the first element of a structure type med_record, 
passed to a function using a pointer. Thus (*ptr) is the equivalent of med_record, 
assuming that ptr is a pointer to that structure. The parentheses are needed as 
the structure member operator . has a higher precedence than the indirection 
* operator. The use of pointers to a structure is so common that C has a special 
Structure Pointer operator ->, which replaces the (*) . pair arrangement thus: 

x = ptr -> hosp_number; 

To compare the two methods of passing structures to functions, I have re¬ 
peated Table l976T a) in Table [9/71 but using pointers to structures. This time the 
function prototype in line C14 declares that the passed parameter is a pointer to 
a structure of type PIA. In line C21, I have named this pointer ptr_2_port, and 
in lines C23 - C28 the -> operator has been used to access the structure elements. 
Notice how the addresses of the two structures portO and portl are passed to 
i ni ti al i ze() in lines Cl5 and C16. 

Using the same compiler to process the C source of Table [7T71 gives 178 bytes 
as compared to 324 bytes resulting from Table liTTIT a). No longer do we need to 
call up a library function to pass a copy of the structure in its entirety. A similar 
advantage accrues when a function returns a pointer to a structure, as compared 
to an actual structure. 

If we use pointers to structures, then we can map the structure anywhere 
we want within the microprocessor's memory map, rather than placing it where 
the compiler wants to E|6). Thus, for our example of a PIA, we could define a 
structure comprising six char objects (not, as before, pointers to objects) and 
assign the pointer to this structure at the base address of the actual physical PIA. 
For example: 


| ^ ^ _ Casts 0x8000 as a pointer to a structure of template PIA 

&port0 = (struct PIA *)0x8000; 
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Cl: 
C2: 
CS: 
C4: 
C5: 
C6: 
C7: 
C8: 
C9: 

CIO 

Cll 

C12 


C13: 


C14: 


struct PIA 

{ 

unsigned 

unsigned 

unsigned 

unsigned 

unsigned 

unsigned 

}; 


Table 9.7 Sending pointers to structures to a function. 

/* Template for PIA 

volatile char *data_a; /* I/O port_A 

char *ddr_a; /* Data Direction register_A 

volatile char *control_a; /* Control register_A 

volatile char *data_b; /* I/O port_B 

char *ddr_b; /* Data Direction register_B 

volatile char *control_b; /* Control register_B 


mai n() 

{ 

static const struct PIA portO = {(unsigned volatile char*)0x8000, 
(unsigned char*)0x8000, (unsigned volatile char*)0x8001, 
(unsigned volatile char*)0x8002, (unsigned char*)0x8002, 
(unsigned volatile char*)0x8003}; 


static const struct PIA portl = {(unsigned volatile char*)0x8020, 
(unsigned char*)0x8020, (unsigned volatile char*)0x8021, 
(unsigned volatile char*)0x8022, (unsigned char*)0x8022, 
(unsigned volatile char*)0x8023}; 


V 

V 

V 

V 

V 

V 

V 


void initialize(struct PIA *);/* Declare ftn accepting a ptr to struct */ 


C15: initialize(&portO); 

C16: i ni ti al i ze(&portl) ; 


C17: /* Main body sends out of portl's B reg the sum of portO & portl's A reg */ 

C18: *(portl.data_b) = *(port0.data_a) + *(portl.data_a); 

C19: } 


C20: /* Function sets up a PIA as a simple input A and output B port */ 

C21: void initialize(struct PIA * ptr_2_port) 

C22: { 

C23: *ptr_2_port -> control_a = 0; 

C24: *ptr_2_port -> ddr_a = 0; 

C25: *ptr_2_port -> control_a = 04; 

C26: *ptr_2_port -> control_b = 0; 

C27: *ptr_2_port -> ddr_b = OxFF; 

C28: *ptr_2_port -> control_b = 04; 

C29: } 


states that the address of portO, previously used to name a structure of type PIA, 
is to be 8000h. As in previous pointer assignments, we must use a cast to convert 
the constant to the appropriate type, which in this case is pointer-to struct PIA. 

This gives us a major headache, as the Data and Data Direction registers share 
the same address, so our six structure members cannot all have unique addresses. 
The way around this problem is to use a union. A union is declared and initial¬ 
ized in the same way as a structure, but all union members occupy the same 
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place in memory. Consider the union template called share, in lines Cl -C5 of 
Tablc ffhHl a). Here the union has two members, ddr and data, both char (byte)- 
sized. Lines C6-C12 declare the structure PIA which has four char-sized mem¬ 
bers, two of which are unions of type share called a and b. As a union appears 
as one location only, this satisfies the physical criteria that a PIA occupies only 
four bytes of memory. 

An element in a union can be accessed by using the Dot operator in the same 
manner as a structure; for example b. ddr is the ddr object in a union called b. 
Where a union is buried within a structure, then the Dot operator can be used 
twice, thus portO.b.ddr is the object ddr inside union b inside struct portO. 
In Table [9781 we are passing a pointer to a structure of type PIA to the function, 
so the equivalent (in line C30) is pntr_2_port -> b. ddr. We see from Table l8.4l 


Table [9T8t Unions (continued next page). 

Cl: union share /* Template for shared DDR and Data registers */ 

C2: { 

C3: unsigned char ddr; 

C4: unsigned volatile char data; 

C5: }; 

C6: struct PIA /* Template for PIA */ 

C7: { 

C8: union share a; /* Shared registers A side */ 

C9: unsigned volatile char control_a; 

CIO: union share b; /* Shared registers B side */ 

Cll: unsigned volatile char control_b; 

C12: }; 

C13: main() 

C14: { 

C15: struct PIA *pntr_2_port0 = (struct PIA *)0x8000;/* portO's base @ 8000h*/ 

C16: struct PIA *pntr_2_portl = (struct PIA *)0x8020;/* portl's base @ 8020h*/ 

C17: void initialize(struct PIA *);/*Decl ftn taking ptr to struct type PIA */ 

C18: initialize(pntr_2_port0); 

C19: initialize(pntr_2_portl); 

C20: /* Main body sends out of portl's B reg the sum of portO & portl's A reg */ 
C21: pntr_2_port0->b.data = pntr_2_port0->a.data + pntr_2_portl->a.data; 

C22: } 

C23: /* Function sets up a PIA as a simple input A and output B port */ 

C24: void initialize(struct PIA * pntr_2_port) 

C25: { 

C26: pntr_2_port->control_a = 0; 

C27: pntr_2_port->a.ddr = 0; 

C28: pntr_2_port->control_a = 04; 

C29: pntr_2_port->control_b = 0; 

C30: pntr_2_port->b.ddr = OxFF; 

C31: pntr_2_port->control_b = 04; 

C32: } 


(a) C source code. 
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Table [978t Unions (continued next page). 


*1 union share /* Template for PIA 

*2 { 

*3 unsigned char ddr; 

*4 unsigned volatile char data; 

*5 }; 

*6 struct PIA /* Template for PIA 

*7 { 

*8 union share a; /* Shared registers A side 
*9 unsigned volatile char control_a; 

*10 union share b; /* Shared registers B side 
*11 unsigned volatile char control_b; 

*12 }; 

*13 main() 

*14 { 

1: .text 

2: .even 

3: _main: link a6,#-12 

*15 struct PIA *pntr_2_port0 = (struct 
4: move.l #8000h,-4(a6); 

*16 struct PIA *pntr_2_portl = (struct 
5: move.l #8020h,-8(a6); 

*17 void initialize(struct PIA *); 

*18 initialize(pntr_2_port0); 

6: move.l -4(a6),(sp) ; 

7: jsr initialize ; 

*19 initialize(pntr_2_portl); 

8: move.l -8(a6),(sp) ; 

9: jsr initialize 

*20 /* Main body sends out of portl's B 
*21 pntr_2_port0->b.data = pntr_2_port0 


V 

V 

V 

V 


PIA *)0x8000; /* portO's base @ 8000h 
Pointer to portO stored at A6-4 
PIA *)0x8020; /* portl's base @ 8020h 
Pointer to portl stored at A6-8 


Push out portO's address on stack 
to pass to function initialize() 

Repeat for portl's address 

reg the sum of portO & portl's A reg 
->a.data + pntr_2_portl->a.data; 


V 


10 

move.1 

-4(a6),al 

; Point Al to 

portO 


11 

move.1 

-4(a6),a2 

; Also A2 



12 

moveq.1 

#0, d7 




13 

move.b 

(a2),d7 

; Get portO's 

data_a byte at 

8000h 

14 

move.1 

-8(a6),a2 

; Point A2 to 

portl 


15 

moveq.1 

#0, d6 




16 

move.b 

(a2),d6 

; Get portl's 

data_a byte at 

8020h 

17 

add. 1 

d6, d7 

; Add them 



18 

move.b 

d7,2(al) 

; Send result 

to portO's data_b regi 

19 

uni k 

a6 




20 

rts 





*22 } 





*23 /* Function sets up a 

PIA as a 

simple input A anc 

output B port 



V 


24 void initialize(struct PIA * pntr_2_port) 

25 { 


21: .even 

22: initialize: link a6,#0 

*26 pntr_2_port->control_a = 0; 

23: move.l 8(a6),al; Point Al to port base address passed in A6+8 

24: clr.b l(al) ; Clear base+1, that is Control reg A 
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Table 9.8 (continued) Unions. 

*27 pntr_2_port->a.ddr = 0; 


25 


move.1 

8(a6),al 

26 


clr.b 

(al) 

*28 

pntr_2_port->control 

_a = 04; 

27 


move.1 

8(a6),al 

28 


move.b 

#4,l(al) 

*29 

pntr_2_port->control 

_b = 0; 

29 


move.1 

8(a6),al 

30 


clr.b 

3 (al) 

*30 

pntr_2_port->b.ddr = 

OxFF; 

31 


move.1 

8(a6),al 

32 


move.b 

#-1,2(al) 

*31 

pntr_2_port->control 

_b = 04; 

33 


move.1 

8(a6),al 

34 


move.b 

#4,3(al) 

35 


uni k 

a6 

36 


rts 


37 


.globi 

_mai n 

38 


.globi 

_i ni ti al i ze 

*32 

} 



(b) Resulting assembler code. 


; Again (bad minimization!) 

; This time clear base; that is DDR A 

; Yet again! 

; Make Control reg A = 00000100b 
; and again! 

; Base+3 is control reg B 
; And yet again! 

; Base+2 is DDR B, make it 11111111b 
; and again! 

; Make Control B = 00000100b 


that both the -> and . operators have the same precedence, and associate right 
to left, so parentheses are not required. 

In Table 19781 I have not defined the structure as being static or const, as 
opposed to Tables [9iH and fh71 This leads to the structure addresses being stored 
in the frame (assembly lines 4 and 5) at run time rather than being in absolute 
memory at load time (static). The qualifier const would not change this, but 
would produce a warning if the program tried to meddle with these addresses. 
Compiling this source produced 130 bytes of machine code. 

Although the procedure outlined in Table 19781 seems best, there can be prob¬ 
lems. The resultant assembly code has located the four elements at Base (line 26), 
Base+1 (line 24), Base+2 (line 32) and Base+3 (line 34), that is at sequential ad¬ 
dresses. However, many compilers will pad out elements to begin at even ad¬ 
dresses. Indeed the circuit of Fig. 13.141 shows the PIA elements located at four 
sequential even addresses (eight bytes), as address line a 0 is not provided by the 
68000 MPU. Most compilers permit various alternative storage configurations for 
structures, and with collusion with the hardware engineer, a suitable scheme can 
be devised. Nevertheless, the awareness of hardware circuitry intruding on soft¬ 
ware matters leads to portability problems if the circuitry or/and processor is 
changed. 

One final note on pointers to structures. If arithmetic is attempted on such 
objects, then one is taken to be the size of the structure. For example: 


pointer = pntr_2_port0 + 1; 
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would give a value to poi nter of four more than pntr_2_port0 (assuming no 
holes in the structure). This could be exploited if a system has a multitude of, say, 
PIAs stored sequentially, which could then be treated as an array of structures. 
Of course, poi nter would have to be defined as a pointer-to structure type PIA, 
before being used in this way. 


9.4 Headers and Libraries 

We have already seen that for clarity at assembly-level, it is better to name con¬ 
stant objects at the head of the program module. Thus, as an example, the loca¬ 
tions of the counter and interrupt flag are named as COUNTER and INT_FLAG in 
lines 11 and 12 of Table lfT2l b). The . DEFINE (some assemblers use EQU) directive 
is used to replace any susequent occurrences of these identifiers by the constants 
9000 h and 9080 h respectively (e.g. lines 22 and 23). 

As well as clarifying the source code, grouping all such definitions as a header, 
makes changing the program to reflect hardware alterations easier. Thus if 
INT_FLAG was subsequently moved to 9002 h, then only the header definition 
line need be changed, and the program reassembled. In a large program, chang¬ 
ing perhaps 20 references to 9080 h is, at the very least, tedious and error prone. 

Where a large modular program is being developed, the likely complex header 
can be written as a separate file and included at the top of each module using the 
. INCLUDE (or equivalent) directive. For example: 

.include "hardware.h" 

Header files are conventionally given a . h suffix. 

Although the . DEFINE directive is normally used as a straight text replacement 
mechanism, some assemblers permit more sophisticated processing. For exam¬ 
ple line 7 of Table EZ] used .DEFINE to evaluate an expression mathematically, 
which was then used to substitute for the delay parameter's name N. 

The C language extends the principle of headers by using a preprocessing stage 
to evaluate directives, which in all cases are identified with a leading # charac¬ 
ter. Conceptually the preprocessor is a separate program fronting the compiler 
proper and sometimes, physically, is as shown in Fig. 17.51 

Some typical substitutions are: 

#define TRUE 1 
#define FALSE 0 
#define ERROR -1 
#define F0REVER_D0 for(;;;) 

#define I/0_P0RT (char*)0x8000 
#define PYE 22/7 

Conventionally, the token which is to be replaced is capitalized. It must be 
separated from both #defi ne and the replacement text by at least one space 
or tab. The replacement text is everything from this point to the end of line. 
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Table 9.9 Using #defi ne for text replacement. 

#define FOREVER_DO while(1) 

#define ANALOG_PORTX (char*)0x6000 

#define ANALOC_PORTY (short*)0x6002 

short Array[256]; /* Global array of 256 words */ 

/* The background routine is defined here */ 

void display(void) 

{ 

register unsigned char x_coord; /* The x co-ordinate */ 

char *const dac_x = ANALOG_PORTX; /* 8-bit X-axis D/A converter */ 

short -'const dac_y = ANALOG_PORTY; /* 12-bit Y-axis D/A converter */ 

F0REVER_D0 /* Do forever */ 

{ 

*dac_y = Array[x_coord]; /* Get array[x] to Y plates */ 

*dac_x = x_coord++; /* Send X co-ordinate to X plates */ 

} 

} 

/* The foreground interrupt service routine is defined here */ 

#define COUNT_PORT (short*)0x9000 

#define INTERRUPT_FLAG (char*)0x9080 

void update(void) 

{ 

static unsigned short last_time; /* The last counter reading */ 

static unsigned char update_i; /* The array update index */ 

short * const counter = C0UNT_P0RT; /* The counter is at 9000/lh */ 

char * const int_flag = INTERRUPT_FLAG; /* The external interrupt flag */ 

*int_flag = 0; /* Reset external interrupt flag */ 

Array[update_i++] = *counter-last_time; /* Difference is new array value */ 

last_time = *counter; /* Last reading is updated */ 

} 


Some compilers insist that the # character begins the line (no spaces) and that 
there is no whitespace between the # and def i ne. Table |9/3l repeats l9.5t a) using 
a header for each module. Notice how the addresses, including casts, are named. 

The #def i ne directive can do more than simple text and mathematics substi¬ 
tutions, it can be used to define macros with arguments, rather like in Table [7761 
Consider the definition: 

#define MAX(X,Y) ((X)>(Y) ? (X) : (Y)) 

Although MAX(X, Y) looks like a function call, any reference to MAX later expands 
into in-line code, for instance: 

temperature = MAX(tl,t2); 

Here X will be replaced by tl and Y by t2 giving the equivalent: 

temperature = ((tl)>(t2) ? (tl) : (t2)); 
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Notice how the macro definition was carefully parenthesized to avoid prob¬ 
lems with complex parameter substitutions. For example: 

#define SQR(X) X*X 
y = SQRC1+7); 

will result iny = l+ 7*l+7; which is 9, rather than 64! The solution is to define 
SQR(X) as: 

#define SQR(X) (X)*(X) 

which results in y = (1 + 7) * (1 + 7); which is 64, as desired. 

When defining macros, there must be no space between the macro name and 
the opening (, otherwise simple substitution will occur; thus: 

#define SQR (X) (X)*(X) 

causes y = SQR(z); to become y = (z)(z) * (z);! 

Care should be taken in defining very complex macros, especially using the ++ 
and - - operators, as their expansion with compound operators can be difficult to 
predict. Any macro or text substitution can be undefined subsequently by using 
the #undef directive. 

Most of the remaining preprocessor directives involve conditional compila¬ 
tion |7||. Consider the following code fragment: 

#ifndef MPU 

#error Microprocessor type not defined 
#endif 

#if MPU == 68K 

typedef short WORD; 
typedef int LONG_WORD; 

#eiseif MPU == 6809 

typedef int WORD; 
typedef long L0NG_W0RD; 


#el se 

#error Unknown microprocessor type 
#endif 

There are quite a number of new keywords used in this example. The purpose 
is to introduce two new types of C objects, namely WORD and L0NG_W0RD, rather 
than use char, int etc. The C operator typedef allows the writer to use syn¬ 
onyms for object types of any complexity. For example the FILE type available 
to hosted C compilers to open, close, write to or read from a named file on disk 
is a synonym for a complex structure type. 

Now to make the types WORD and L0NG_W0RD portable, the underlying base 
type must be chosen according to the target processor. Thus for example, i nt 
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is a 16-bit word in most 6809 and 8086 compilers, and usually 32 bits for 68000 
and 80386 target compilers. Our example defines the WORD and L0NG_W0RD types 
differently according to the state of the variable MPU, which has been set by the 
operator prior to using the compiler. For example in the MSDOS operating system: 

SET MPU = 68K 

in the startup autoexec. bat file will do the needful. Alternatively putting a 
#def i ne MPU = 68K in the first line of the header would have the same effect. 

The #i fndef of the first line says that if MPU is not defined, print an error 
message (#error). The #endif that follows, closes this sequence down. The 
#if, #elseif and #else directives that follow, have their obvious meanings, 
and delimit one of three actions depending on the state of MPU. 

The #i ncl ude directive is used to read in a specified file at the point at which 
it occurs. In C there are two versions, for example: 

#include "hardware.h" 

#include <hardware.h> 

In the case of the former, the preprocessor assumes that the file hardware. h is in 
the same directory as itself. In the latter, various other specified directories are 
searched as well, usually a special header subdirectory. The details are compiler 
dependent. Usually the quotes version is used for your own private include files, 
whereas the angle bracket form specifies standard library header files. Of course, 
files other than headers may be included, such as other C source programs. 

All C compilers come with a set of libraries, which give the writer facilities to 
do complex mathematics, input and output routines, file handling, graphics etc. 
These libraries consist of a number of functions (see Table [9710) in object code 
form, together with a dictionary. Such libraries are added to the li nk er's co mm and 
line as shown in Fig. l7.5t however, the linker does not treat a library object file 
in the same way as a normal program object file. Rather than adding all the 
object code in a library to the existing code, only functions which are referred 
to and declared extern by the user's modules are extracted. Thus functions are 
selectively added. 

Most compilers come with a librarian utility. This allows the programmer to 
make up a library of his/her own favorite functions or, more dangerously, alter 
the commercial ones. The linker scans libraries in the order they are named in its 
command line; thus it is possible to replace unsatisfactory commercial functions 
by home-brew ones. 

Old C did not specify a standard library, although many of the more common 
functions became a de facto part of the language. The ANSI standard does specify 
a de jure co mm on core library (g), but most compilers have additional libraries 
to deal with operating system-specific functions, graphics, communications etc. 

In general the standard libraries are only relevant in a hosted environment. 
In a free-standing situation, such as met in embedded microprocessor targets, 
many library functions are either irrelevant or require modification. 

Most compilers that are not operating-system specific use libraries at several 
levels. The lowest of these is the machine library, which holds basic subroutines 
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which the assembly-level source code can use without the writer at the high-level 
being aware of their existence. Thus, for example, an integer multiplication in a 
6809 MPU target requires a i6 x 16 operation, although the processor itself has 
only an integral 8 x 8 MUL instruction. It is likely that the C-originated assembly 
code will include a JSR to the requisite subroutine held in the machine-level li¬ 
brary. An example of this is given in line 10 of Table 19751 where the subroutine 
a~pushstr is used by the compiler to implement the passing of a structure to a 
function (see also Table [T4~6l line 115). 

The next up in the hierarchy of libraries provides low-level support routines 
used by the user callable libraries, and includes all the operating-system inter¬ 
face routines. For example, they may contain subroutines to obtain a character 
from a terminal (typically called i nch for input character) and to output a single 
character (typically outch for output character). The actual code here depends 
on the hardware. In a non-hosted environment, the writer will alter such routines 
to suit the system. 

The user-callable libraries contain all the functions which may be explicitly 
called from the C program. These are the ANSII standard libraries and the various 
high-level options, such as graphics. Such libraries make use of the low-level 
support library when interacting with the environment. 

Variations, include optional integer libraries (suitable for embedded applica¬ 
tions where the normal floating-point functions may not be required) and libraries 
coded to make use of mathematics co-processors. 

Given that libraries comprise a number of functions external to the user's 
program, such functions that are to be called must be declared extern and pro¬ 
totyped in the normal way. To avoid this chore, compilers come with a number 
of standard header files which may be #i ncl uded as appropriate at the head of 
the user program. Table [9H0l shows the header file math.h provided with the 
Cosmic/Intermetrics cross 6809 C compiler V3.3F, as an example. This declares 
most of the standard ANSII maths library. As can be seen, the majority of maths 
functions take doubl e float arguments and return a doubl e float value. 

This header file is designed to be used by several related compilers. If the 

variable _PR0T0 has been defined, then any text of the form (a) will be replaced 

by just a: 

#ifdef _PR0T0 

#define (a) a 

#el se 

#define (a) () 

#endif 

For example, on this basis the first True line will be converted by the preprocessor 
to: 

double acos (double x); 

which is the normal ANSII C function prototype. However, if _PR0T0 is not de¬ 
fined we will get: 
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double acos (()); 

which is suitable for an old C-style compiler, which does not support prototyping. 

Notice how the internal variable_MATH_is defined at the top of the header. This 

lets subsequent headers know that the math . h header is present. 

Finally, the ANSII committee have authorized the #pragma directive, as a prag¬ 
matic way of introducing compiler dependent directives, which may be anything 
the compiler writer wishes. An example of this from the same compiler is: 

#pragma space [] @ dir 

which instructs the compiler to store (i.e. space) all non-auto data objects (des¬ 
ignated []) in direct memory. That is, use the Direct address mode for static 
and extern data objects instead of the default Extended Direct addressing mode. 


Table 9.10 A typical math. h library header (with added comments). 

/* MATHEMATICAL FUNCTIONS HEADER 

* copyright (c) 1988 by COSMIC 

* copyright (c) 1984, 1988 by Whitesmiths, Ltd. 

V 

#ifndef _MATH_ 

#define _MATH_ 1 

/* set up prototyping */ 

#ifndef_ 

#ifdef _PR0T0 

#define (a) a 

#else 

#define (a) () 

#endif 
#endif 


/* function declarations */ 

double acos _((double x)); /* Computes the radian angle, cos of which is x */ 

double asin _((double x)); /* Computes the radian angle, sine of which is x */ 

double atan _((double x)); /* Computes the radian angle, tan of which is x */ 

double atan2 _((double y, double x)); 

/* Computes the radian angle of y/x. If y is -ve the result is -ve. 

If x is -ve the magnitude of the result is greater than pi/2 */ 

double ceil _((double x)); /* Computes the smallest integer >=to x */ 

double cos _((double x)); /* Computes the cosine of x radians, range [0,pi] */ 

double cosh _((double x)); /* Computes the hyperbolic cosine of x */ 

double exp _((double x)); /* Computes the exponential of x */ 

double fabs _((double x)); /* Obtains the absolute value of x */ 

double floor _((double x)); /* Computes the largest integer <= x */ 

double fmod _((double x, double y));/* Computes the floating-pt remainder of x/y */ 

double log _((double x)); /* Computes the natural logarithm of x */ 

double loglO _((double x)); /* Computes the common logarithm of x */ 

double modf _((double value, double *pd)); 

/* Extracts the integral and fractional parts */ 

double pow _((double x, double y)); /* Raises x to the power of y */ 

double sin _((double x)); /* Computes the sine of x rads, range [-pi/2,pi/2] */ 

double sinh _((double x)); /* Computes the hyperbolic sine of x */ 

double sqrt _((double x)); /* Computes the sqr root of x; if x -ve returns 0 */ 

double tan _((double x)); /* Computes the tan of x rads, range [-pi/2,pi/2] */ 

double tanh _((double x)); /* Computes the hyperbolic tangent of x */ 

int abs _((int i)); /* Obtains the integer absolute value of i */ 


#endif 
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Obviously this is very target specific, and considerations of this nature are the 
subject of the next chapter. 
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CHAPTER 1 0 


ROMable C 


In the last two chapters we have seen that it is possible to take a source program 
written in C and compile to assembly level. This assembly code can then be 
linked and converted into a machine code file, ready for loading, as described 
in Chapter 7. In that chapter, we observed that the environment of a hosted 
computer is very different to that of a naked system. In the former situation, 
each operator request for a program run causes the relevant machine-code file 
to be loaded into computer RAM (usually from disk) and execution to commence. 
In a naked system the program is normally permanently resident in ROM. Thus 
the initializing loading stage is eliminated. A compiler producing code which can 
be run in ROM is known as a ROMable compiler. 

At the very least, a ROMable compiler must provide the means to put pro¬ 
gram code and constant data in one section of memory (i.e. ROM), and variable 
data in another (i.e. RAM). However, there remain several other problems to over¬ 
come before a high-level sourced program can successfully run in a naked system. 
Typical of these are the means to set up the System stack, Reset and Interrupt 
vector tables, link in hand-assembled routines and implement exception service 
routines. Handling MPU-specific tasks, such as setting interrupt mask bits in the 
Status/CCR register, and portability issues raise their heads. 

Most of these hardware-related activities are handled by the operating system, 
but in a free-standing environment the programmer must provide such services 
as are required by the executing software. In this chapter we examine this aspect 
of software design in more detail. 


10.1 Mixing Assembly Code and Starting Up 

As far as a microprocessor is concerned, life begins after a Reset. It is the respon¬ 
sibility of the design engineer to ensure that the various fixed restart vectors are 
in their predetermined place prior to this event. This ensures that the MPU can 
make it to the start of the executable program. There are several other chores 
that must be performed before the processing proper can commence. Typically, 
bits must be twiddled in the Status/Code Condition register, such as the Interrupt 
Mask and State flags. Dynamic exception vectors must be loaded, stacks set up 
and Page/Segment registers initialized. C is a powerful language, but its power 
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does not extend down into specific machine registers. 

As C cannot make itself a congenial environment, we must do this for it by 
writing a startup routine in a native assembly code and use the linker to join the 
two together in holy matrimony (2. 

Ignoring interrupts, which we will cover in the next section, we have three 
tasks to perform: 

1. Set up the System Stack Pointer to the top of System stack. 

2. Ensure that the Restart vector is in the appropriate ROM location. 

3. Go to the C program. 

Table fTOOl shows a possible implementation. Three source listings are given. 
The startup routine proper simply puts a suitable address into the System Stack 
Pointer and jumps to the subroutine/function created by the C compiler. Tradi¬ 
tionally this is named mai n(). At assembler level the Whitesmiths group com¬ 
pilers transform function names with a leading underscore, hence _mai n. This is 
not universally the case; for example a leading period or lagging underscore are 
common. However named, normally the main C function is written as an endless 
loop, and thus there will not be a return. In the illustrative situation depicted in 


Table lTTm Elementary startup for a 6809-based system (continued next page). 


.processor m6809 


Startup code for non-interrupt system 
Assumes RAM up to 07FFh 


_Start: 


external 

_mai n 

Ids 

#0800h 

jsr 

_mai n 

bra 

_Start 

pub!ic 

_Start 

end 



_main is outside this file 
Point Stack Pointer to top of RAM 
Go to C code 

Should it return then repeat 
Make this routine known to linker 


(a) Startup source executable code, startup.s. 


.processor m6809 




Vector table, Reset vector only 


.external _Start ; Start is outside this file 

.word [6] ; Miss out the interrupt vectors 

RESET: .word _Start ; Put restart address here 

. end 


(b) The source vector table, vector. s. 


mai n() 

{ 

static int i; 
while(l) {i++;} 
} 


(c) A dummy C function, f red. c. 
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Table 10.1 (continued! Elementary startup for a 6809-based system. 

1 ; Compi1ateur C pour MC6809 (COSMIC-France) 

2 .list + 

3 .processor m6809 

4 .psect _bss 

5 0001 00 00 L3_i: .byte 0,0 


6 ; 

7 ; 

8 ; 

9 ; 

10 

11 

12 E000 

13 E004 

14 E007 

15 


* Startup code for non-interrupt system 

* Assumes RAM up to 07FFh 


.psect _text 
.external _main 

10CE0800 _Start: Ids #0800h 

BDE009 jsr _main 

20F7 bra _Start 

.public _Start 


_main is outside this file 
Point Stack Pointer to top of RAM 
_main is outside this file 
Should it return then repeat 
Make this known to the linker 


16 



i 

man n() 


17 



2 


{ 


18 



3 


static 

i nt i ; 

19 



4 


whi 1 e(l) 

20 

E009 

7C0002 

_main: 

i nc 

L3_i+1 

21 

E00C 

2603 



jbne 

L4 

22 

E00E 

7C0001 



i nc 

L3_i 

23 

E011 

20F6 


L4: 

jbr 

_mai n 

24 


; 

5 


} 


25 





.public 

_mai n 


{i++;} 


26 

27 

28 

29 

30 FFF2 

31 FFFE E000 

32 


Vector table, Reset vector only 


.external _Start 
.word [6] 

RESET: .word _Start 

. end 


Start is outside this file 
Miss out the interrupt vectors 
Put restart address here 


(d) Resulting code. 


Table HITTl c). this is a (rather useless) counter function, continually incrementing 
the variable i. 

The vector table in the 6809 MPU lives in a different region of memory from the 
program text, and for this reason is written in Table [Tim as a separate module. 
At link time, it will be put into the appropriate location. 

The resulting linked assembly code was produced using the Cosmic/Intermetrics 
6809 C cross-compiler version 3.31, with the linker set thus: 

lnk09 +data - bl -i-text -bOxEOOO startup.o fred.o +text -b0xFFF2 vector.o 

The startup. s and vector. s files are assembled to their relocatable object ver¬ 
sions startup. o and vector. o. The compiler then converts f red. c to f red. o 
and li nk s startup.o to this code followed by vector.o. startup begins at 
EOOOh (+text -bOxEOOO) and vector at FFF2h (+text -b0xFFF2), where we are 
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assuming ROM between E000 h and FFFFh (e.g. a 2764 EPROM). The code in Ta¬ 
ble llO.ll d) shows everything in its proper place. 

If desired, the Vector module could be written in C and compiled to vector. o 
before linking. A possible C routine with the same role as Table HOTTl b) is given 
in Table 110.21 This is an example of an array of pointers to functions, where 
vector [n] is a const pointer to function n (the name of a function is its address, 
thus mai n is the pointer to function mai n ()) 0. Only the Reset vector is shown; 
to expand the function to include interrupt vectors just replace the null pointers 
by the root name of the appropriate handler function (see Table [To77l >. 

ANSII C permits a pointer of any kind to be assigned the constant zero, as is 
done on line C2 of Table IT072l that is a void or null pointer. No legitimate data 
should be held at this address (i.e. OOOOh) 0. For this reason, the linker script 
above, which assumed memory between OOOOh and 07FFh (e.g. a 6116 RAM), 
started the data bias at 0001 h (+data -bl) rather than the more obvious 0000 h bias. 

Starting up a 68000 MPU-based system can be done in the same way as for 
the 6809, with a separate text bias for the Vector and Startup routines, typically 
00000 h and 00400/t respectively. However, as the program usually directly fol¬ 
lows the Vector table, a composite Vector/Startup module may be created and 
linked in at zero. As shown in Table ITorl the User Stack Pointer is setup and the 
state changed to User before entering the main C routine (see also Table ITTTol i. 

Tables HOTl and [TO. 3l are simple examples of incorporating assembly routines 
with C code. They are elementary because no data is explicitly passed between 
them. It would have been quite easy to pass the value, say, i = 6 to mai n () in 
Table llO.Tl but we would have to know how the C compiler handles such variables, 


Table 10.2 Using arrays of pointers to functions to construct a vector table. 


extern StartO; 

(* const vector[])() = {0,0,0,0,0,0,Start}; 


(a) C source code. 


1 

2 

B 

4 

5 

6 


Compilateur C pour MC6809 (COSMIC-France) 


.list + 
.processor m6809 
.psect _text 


1 extern StartO; 


2 (* const vector[])() = {0,0,0,0,0,0,Start}; 


7 FFF2 0000 

8 FFF4 0000 

9 FFF6 0000 

10 FFF8 0000 

11 FFFA 0000 

12 FFFC 0000 

IB FFFE E000 


.vector: .byte [2] 


14 

15 

16 


.byte [2] 

.byte [2] 

.byte [2] 

.byte [2] 

.byte [2] 

.word _Start 

.public _vector 
.external _Start 
. end 


(b) Resulting code after the link. 
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Table 10.3 A simple Startup/Vector routine for a 68000-based system. 
—1WSL 3.0 as68k Thu Apr 13 14:45:17 1989 


1 

2 00000 OOOOaOOO 

3 00004 00000416 

4 00008 


.extern _main 
SSP: .long OxAOOO 
PC: .long _Start 

=.+0x3F8 


_main is outside this file 
Initial setting of SSP 
Restart PC value 
Skip up to 0400h 


5 00400 207c 00001000 _Start: 

6 00406 4e60 

7 00408 027c dfff 

8 0040c 4eb9 00000416 

9 00412 6000 fff8 
10 


movea.l #0x1000,aO 
move a0,usp 
andi.w #0xDFFF,sr* 
j s r _main 
bra _Start 
. end 


Fix to set-up User Stack Ptr 
Privileged instruction 
Bit 13 changes to User state 
Co to C routine 
Repeat if returns 


as each has its own house rules. In fact, that particular compiler would have 
expected i to be passed in Accumulator_D rather than through the System stack 
(see Table fllkfil . Thus in any particular compiler, a knowledge of its operation is 
needed in order to mesh the two successfully. 

Before giving an example, why use a mixture of the two languages, except 
for startup? It is an accepted rule of thumb that a program will spend some 
90% of its time in around 10% of the code a. Where time is of the essence, 
replacing this code by equivalent assembly-based subroutines will be beneficial. 
Another candidate for assembly code is the creation of library routines (see also 
Table 110.161 . As these will be used by many different projects, time spent in 
refining such code can be justified in some cases. 

Our example here involves the creation of a library subroutine to return the 
unsigned short int square root of an unsigned short int parameter. The 
function is to mi mic the C function: 

unsigned short sqroot(unsigned short); 

for the Whitesmiths 68000 C compiler version 3.2. 

The relevant house rules for this compiler are: 

1. Integral and pointer parameters are extended to four bytes and pushed onto 
the System stack least significant byte first. Where there is more than one 
parameter, then the compiler works along the list from right to left. 

2. Registers D3-D5 and A3-A7 are guaranteed unaltered by the function on 
return. 

3. Integral and pointer parameters are returned in D7.L. 

There are of course also rules for floating-point and structure objects. 

The algorithm implemented by Table l 1 0.4l uses Newton's numerical method 0]. 
This states that if we guess an initial value for yfx, usually \x, then: 

NEW_ESTIMATE = *(0LD_ESTIMATE - x/0LD_ESTIMATE) 
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Table 10.4 A C-compatible assembler function evaluating the square root of an unsi gned 1 nt. 
.processor m68000 


Calculates the square root to the nearest lower Integer 
using Newton's method where an original estimate of n/2 is made 
and successive estimates are = (old_estimate + n/old_estimate)/2 
Exit either after 20 iterations or when new and old estimates 
are the same 

EXAMPLE : Return for n = 18 is 4 

ENTRY : short unsigned int is passed on the Stack at SP+4/5 

EXIT : Return in D7.W as an unsigned short int, max 256 

EXIT : D0/D1/D2 and CCR altered 


J 

» 

_sqr_root: 

move.w 

4(sp),d7 

Copy n to D7.W 


cmp. w 

#l,d7 

n = 0 or 1? 


bhi 

CONTINUE 

IF higher than continue 


bra 

EXIT 

ELSE exit with answer = n 

CONTINUE: 

lsr.w 

#l,d7 

Create first estimate by dividing by 2 


move.w 

#19,dO 

19+1 iterations count in D6.W 

; After initialization repetitively build up new estimate in D2.L 

LOOP: 

move.w 

d7,dl 

Copy estimate into Dl.W 


clr.l 

d2 

Copy n into D2 as a 32-bit clone 


move . w 

4(sp),d2 

for the division following 


di vu 

d7,d2 

[D2.W] = n/old_estimate 


move.w 

d2,d7 

Move it to D7.W 


add .w 

dl,d7 

[D7.W] = old_estimate + n/old_estimate 


lsr.w 

#l,d7 

Divide by 2 to give the new estimate 


cmp. w 

dl,d7 

Compare new with old estimates 


dbeq 

dO,LOOP 

IF equal exit ELSE dec loop count; IF not 

EXIT: 

rts 


ELSE exit with answer in D7.W 


. publ i c 
. end 

_sqr_root 

Make known to the outside world 


and if we keep going round the loop, the estimate will converge to the desired 
value. In our listing, I have exited whenever NEW_ESTIMATE = OLD_ESTIMATE or 
when the number of interations reaches 20. The latter is necessary, as numerical 
techniques often produce an oscillating outcome (for example x = 65535 pro¬ 
duces an estimate alternating between 255 and 256), or even do not converge. 
Without an unconditional out, such functions may go into an unscripted endless 
loop. _ 

In Table 110.41 all variables are held in registers, so no frame is created and 
SP is used as the reference to obtain the passed variable x (MOVE.W 4(SP) ,D7). 
Furthermore, none of the preserved registers are used, therefore they do not 
require saving and retrieving. The answer is returned in D7 as required. 

Calling up the function from a C program is done in exactly the same way as 
any function actually written in C, for example: 

x = sqr_root(27U); 

where the suffix U indicates an unsigned type constant. Of course sqr_root() 
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Table 10.5 Using in-line assembly code to set up the System stack. 

mai n() 

{ 

static int i ; 

_asm("lds #0800h ; Point Stack Pointer to top of RAM")I 

while(l) {i++;} 

} 

(a) C source. 


; Compilateur C pour MC6809 (COSMIC-France) 




.list 

+ 





.processor m6809 





.psect _ 

bss 



L3_i : 


. byte 

[2] 



; 

1 

mai n() 




; 

2 


{ 





.psect _ 

.text 



j 

3 


static int 

i; 


; 

4 


_asm("lds 

sz 

o 

o 

00 

o 

Point Stack Pointer to top of RAM"); 

_main: 

Ids 

#0800h ; 

Point Stack 

Pointer to top of RAM 

» 

5 


whi1e(l) 

{i++;l 




i nc 

L3_i+1 





jbne 

L4 





i nc 

L3_i 



L4: 


jbr 

.mai n 




6 } 

.public _main 
. end 


(b) Resulting assembly code. 

will be external to the C program, so an extern declaration must be made in the 
normal way before sqr_root() is called, thus: 

extern unsigned int sqr_root(unsigned short); 

One of the disadvantages of using any high-level language is the loss of the 
ability to use any special feature of the underlying processor. For example, it 
may be necessary to lock out any interrupt occurring during a specific part of the 
code. How could we handle a 6809-based system with the requirement to stop 
at a specific point and use the SYNC instruction (see page 1163) to continue when 
an interrupt subsequently occurs? Of course, we could write the code as part of 
an assembly subroutine and link it in as previously shown, but this is not very 
efficient for short sequences. 

Many C compilers permit the insertion of assembly source lines interleaved in 
the C source code. Although this is a co mm on feature, it is not standard, and thus 
is very implementation dependent. Where it is available, the keyword asm is usu¬ 
ally involved. For example, the Aztec C compilers use #asm and #asmend to sand¬ 
wich such code. The Microtec equivalent uses a #p ragma asm - #p ragma endasm 
sandwich. Our illustration in Table H075l uses the Whitesmiths group built-in func¬ 
tion _asm() for this purpose. Here I have forced a LDS #0800h assembler line 
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in at the beginning of the C code. This obviates the need for the Startup mod¬ 
ule, but the Vector module must still be linked in. _asm() can take several lines 
of assembly code as its argument between double quotes, and use \n and \t to 
indicate New Line and Horizontal Tab respectively. 

The Microtec asm() can optionally use an assembly command to return data 
to a C object. For example: 

switch = asm(unsigned char, "move.b 9000h,d0"); 

which assigns the value read from 9000 h to an unsigned byte C variable. 

Despite its flexibility, assembler windows should be used sparingly, as it seri¬ 
ously compromises the portability of such code (see Section 10.4). 

It is possible to call a function whose absolute location is known from a C pro¬ 
gram, but which cannot be accessed in relocatable object form by the linker. This 
is likely to occur when the target system has a resident operating system/monitor, 
and the C user program wishes to use those external resources. Another situation 
which requires this facility, is where a preprogrammed mathematics package is 
resident, for example the 6839 floating-point ROM. 

As an example, assume that a certain ROM-based 6809-monitor has a subrou¬ 
tine called OUTCH (OUTput CHaracter) located at F830 h. This sends out a single 
character, passed to it in Accumulator_B, to the terminal. We wish to make use of 
this subroutine in implementing a C function which sends a character ch to the 
terminal whenever called. 

Now we noted on page 12811 (see also Table IToTTZT i. that in ANSII C the name 
of a function is a pointer to that function, that is its address. Thus, it might 
be thought that the statement (0xF830) (ch) ; would pass ch and jump to the 
subroutine at F830/i. However, 0xF830 is an integer constant so we must first 
cast it to type pointer-to a void function taking a single char parameter, that 
is (void(*) (char))0xF830. This complex cast reads from inside out: pointer 
to function (*)/ taking a char (char)/returning void. The whole is enclosed 
by the cast's parenthesis and qualifies the constant 0xF830. Note how the com¬ 
plex type reads from inside out first right then left. This is the normal way of 
constructing compound types. 

In Table [TTnil I have used a header to replace the name OUTCH by this casting. 
It would be normal to use a header to define the resources available in such a 
co-resident ROM. Thus the statement: 

(OUTCH)('\n'); 

translates in Table [ToTHI to: 

LDB #10 
1SR 0F830h 

as desired. 

Table 110.61 defines a function known as void new_iine(void) which is de¬ 
signed to send a Line Feed (ASCII code 10) to the ter mi nal. This simply in turn 
sends out ' \n' to OUTCH. The character ' \n ' is C'ese for New Line (or Line Feed). 
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As an alternative, an absolute value may be cast to char and passed to OUTCH; 
thus in this case OUTCH ((char) 10) is a direct equivalent to line C4, but rather 
less readable. Other useful C escape sequences (or tokens) for non-printable char¬ 
acters are \t for Horizontal Tab (ASCII code for HT is 9), \v for Vertical Tab (VT 
= 11), \b for Back Space (BS = 08), \r for Carriage Return (CR = 13), \f for Form 
Feed (FF = 12), \a for Audible alert (BEL = 7). 

Two points to notice concerning the coding. As previously mentioned, the Cos¬ 
mic/Intermetrics 6809 V3.3 compiler passes its first integral type parameter in 
its Accumulator_D rather than on the System stack. With a byte (char) parameter 
only, the right half of D is used, that is Accumulator_B. If OUTCH did not expect its 
parameter in this register, then a line of assembly code would be needed to match 
the C function parameter passing convention to that of the monitor subroutine. 
For instance, TFR B,A if OUTCH expected its parameter in Accumulator_A. Also, 
any registers which the C-function house rules say should be preserved should 
be saved before calling up the alien subroutine. This compiler does not make any 
assumptions concerning the return state of the 6809's registers. 

The function (OUTCH) () should not be declared in new_1 i ne (), as the use of 
a fixed address in the function call is an equivalent procedure. Neither should it 
be declared extern, as it will not be linked in. 


10.2 Exception Handling 

Interrupts and their software cousins are handled using the techniques discussed 
in the previous section. In order to process an interrupt correctly, the software 
must arrange for: 

1. The service routine start address to be in its correct vector location. 

2. Any registers not preserved by the compiler's function house rules to be saved 
and retrieved. 


Table 10.6 Calling a resident function at a known address. 
; Compilateur C pour MC6809 (COSMIC-France) 


2 





.list 

+ 

B 





.processor m6809 

4 





.psect 

_text 

5 


) 

1 

#define 

OUTCH 

(void(*)(c 

6 


J 

2 

void new_line(void) 

7 


> 

3 

{ 



8 


) 

4 

(OUTCH) C\n' 

); 

9 

E000 

C60A 

_ 

iew_line: 

ldb 

#10 

10 

E002 

BDF830 



jbsr 

0f830h 

11 


J 

5 

} 



12 

E005 

39 



rts 


13 





.public 

_new_line 

14 





. end 





EXCEPTION HANDLING 287 


Table 10.7 6809 startup for the system of Table [9 51 
.processor m6809 


Startup code for background displayO and IRQ entered updateO 
Assumes RAM up to 07FFh 


6 




.external 

.display, _update 

Both routines are outside 

7 

EOOO 

10CE0800 

_Start: 

Ids 

#0800h 

Stack Pointer to top of RAM 

8 

E004 

1CEF 



andcc 

#11101111b 

CLI 

9 

E006 

BDE00F 



jsr 

_di spl ay 

Go to background C code 

10 

E009 

20F5 



bra 

.Start 

If it returns, then repeat 

11 

EOOB 

BDE035 

IRQ_handl er: 

jsr 

.update 

Go do function updateO 

12 

E00E 

3B 



rti 


Exit IRQ handler 

13 




.public 

.Start, IRQ_handler 

Make known to linker 

14 





. end 




(a) Startup showing the IRQ handler routine. 

.processor m6809 


FFF2 

FFF8 EOOB 

8 FFFA 

9 FFFE EOOO 


Vector table, IRQ and vector only 


.external _Start, IRQ_handler 


IRQ: 


RESET: 


10 


.word 
.word 
.word 
.word 
. end 


[3] ; Miss out SWI2, SWI3 and FIRQ 

IRQ_handler ; Put IRQ handler address here 

[2] ; Miss out SWI & NMI 

_Start ; Put restart address here 


(b) Vector table showing the IRQ handler address. 


3. The service function to be terminated by a Return From Interrupt operation 
(e.g. RTI, RTE, IRET) rather than a Return From Subroutine (e.g. RTS, PULS PC, 
RET). 

Consider the program of Table 19.51 There are two functions here, the back¬ 
ground main function called di spl ay C) and the interrupt service function update (). 
Function update() is not explicitly entered, or indeed known, by background 
function di spl ay(); they communicate through global object Ar ray [], which is 
known to both of them. 

We look first at the 6809 processor and assume the use of IRQ to switch con¬ 
text. As the entire processor state is automatically saved, all our interrupt han¬ 
dler (IRQ_handl er in line 11 of Table ! I 0.71 a)) has to do is jump to the subroutine 
_update, and on return do a RTI. The address of IRQ_handler is placed in the 
IRQ vector in Table [TOTt h). Thus, when an IRQ interrupt occurs, the processor 
will save its state and go via the IRQ vector (FFF8:9h) to the stub IRQ handler in 
the startup routine. This simply jumps to the appropriate C function and termi¬ 
nates with a RTI. Notice that this startup routine clears the I mask bit in the CCR 
(line 8), which allows the MPU to respond to IRQ requests. The I mask has been 
automatically cleared after a Reset. 

The situation would be a little more complex if FIRQ were used to initiate the 
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Table 10.8 68000 startup for the system of Table \9f5 1 
-1WSL 3.0 as68k Wed Apr 19 15:45:50 1989 


Startup for background display and INT2 entered updateQ 


3 

4 




. extern 

_di splay 

.update v 

Both routines are outside 

5 

00000 

OOOOaOOO 

SSP: 

.long 

OxAOOO * 

Initial setting of SSP 

6 

00004 

00000426 

PC: 

.long 

.display 4 

Restart PC value 

7 

00008 




.=.+96 

V 

Co to Level2 int vector 

8 

00068 

00000416 

INT2: 

.long INT2_handler * 

Addr. of INT2 handler here 

9 

0006c 




.=.+916 


Move up to 400h 

10 

00400 

207c 

00001000 

.Start: 

movea.1 

#0x1000,aO* 

Make to set-up User Stack 

11 

00406 

4e60 



move 

aO,usp 


12 

00408 

46f8 

0100 


move.w 

0x0100,sr * 

User state, Int mask = 001 

13 

0040c 

4eb9 

00000426 

ENTER: 

jsr 

.display * 

Co to background C routine 

14 

00412 

6000 

fff8 


bra 

ENTER * 

Repeat if returns 

15 

00416 

48e7 

e3f0 INT2 

_handler 

:movem.1 

d0-d2/d6/d7/a0-a2,-(sp) 








* Save relevant regs 

16 

0041a 

4eb9 

00000466 


jsr 

.update 

* Co to INT2 service routine 

17 

00420 

4cdf 

0fc7 


movem.1 

(sp)+,d0-d2/d6/d7/a0-a2 








* Restore original state 

18 

00424 

4e73 



rte 



19 





. end 




exception. In this situation, only the PC and CCR are automatically saved. Thus 
the handler must use a Push/Pull pair to sandwich the JSR, in order to preserve 
the state. This is the situation for all 68000-based interrupts and the Push/Pull 
sandwich is clearly seen at INT2_handl er in lines 15 and 17 of Table IT0781 The 
house rules of this compiler (Whitesmiths V3.2) are such that registers D3 to D5 
and A3 to A7 are preserved in any C function, so only the remaining registers are 
saved by the handler. The three interrupt mask bits are set to 001 in line 12 to 
enable level-2 interrupts (they were set to 11 1 when the MPU was Reset). 

Both Tables [T0.7I and [10.81 are linked in with the C code in exactly the same 
manner as for the corresponding Tables [XHTl and HO.31 Software interrupts and 
exceptions are handled in the same way as hardware interrupts. Where interrupt 
vectors are stored in RAM rather than the normal ROM, the startup routine must 
dynamically load the address before enabling the interrupt mask. 

In a realistic system, the startup is likely to be more complex than these exam¬ 
ples show. For example, any programmable I/O devices should be configured be¬ 
fore enabling interrupts. If the exception service routine communicates through 
global variables and these are presumed to have an initial value, then this too 
should be done in the startup module. This will be described in the following 
section. 

The double-hop response to an interrupt slows down the MPU's response to 
a request. There are two ways around this problem. The first involves writing 
the interrupt service routine (ISR) entirely in assembly language; thus the handler 
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becomes the whole routine. If the ISR is of any size, it is likely that it will be in a 
separate file or library, and will be added in through the linker. 

Some compilers allow the programmer to specify a C function as an inter¬ 
rupt service routine. In such cases the generated assembly code includes an 
entry sequence that saves all used registers that the compiler's house rules state 
must be preserved. On exit these are returned and a RTI/RTE generated at 
exit. Like assembly windows, these are extensions to the ANSII standard and are 
highly compiler specific. As an example, the Mictrotec Research Paragon C cross 
to 68000/68020 V3 compiler requires such functions to be sandwiched by the 
{INTERRUPT directive, for example: 

#define {INTERRUPT 
< definition of function ifred() > 

#undef {INTERRUPT 

Function i f redO will then be coded as an interrupt service routine rather than 
a subroutine. As many functions as required may be sandwiched. 

To illustrate the effect of {INTERRUPT, consider the Real-Time Clock program 
of Table [8761 Compiling this as an interrupt service function with the Paragon 
compiler, gives the code in Table H(L9l Notice how the registers are saved and 
restored at the beginning and end of the routine, and the terminating RTE. In 
this situation, the address of clockO, that is .clock, should be placed in the 
appropriate vector, rather than that of an intermediate handler. 

The Whitesmiths group compilers versions 3. 3 and up , use th e prefix @port 
to specify interrupt service functions, see Tables llA6l and [l4. 121 Thus: 

©port void clock(void) 

{body} 

would give us the Real-Time Clock interrupt service function for these compilers. 

Using interrupts in high-level code is fraught with difficulties. Unlike assembly 
code, an interrupt will be serviced in the middle of a high-level instruction. If, for 
example, we had a global i nt variable i which was shared between background 
and foreground routines, then an interrupt in the middle of an instruction i ++ 
may well produce intriguing results, for instance: 

i ++ 

«< 


L4 

Here we have assumed a 6809 compiler with a 16-bit int. To increment this 
object, the lower byte has been increment ed fir st, and only if this rolls over to 
zero is the upper byte incremented (Table flOTl lines 20-22). If i was initially 
OOFFh, then the first INC produced i = 0000. If an interrupt now occurs, and 
the ISR used, say, i to update array [i], then array [0] will be altered instead 
of array [256]! Clearly a compiler that used the sequence: 


i nc 

L3_i+1 

; Increment lower byte 



- - Interrupt -------- 

bne 

L4 

; IF not zero THEN continue 

i nc 

L3_i 

; ELSE increment upper byte 
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Table 10.9 clockO configured as an interrupt function. 

Microtec Research ASM68008 V6.2a Page 1 Thu Apr 20 11:16:34 1989 

Line Address 

1 * Paragon MCC68K Compiler Version 3.1 

2 OPT NOABSPCADD,E,CASE 


3 

4 

5 


6 

00000400 

48E7 

COCO 


.ci ock: 

7 

00000404 

202F 

0014 



8 





* 1 

9 





* 2 

10 





* 3 

11 





* 4 

12 





* 5 

13 

00000408 

207C 

0000 

E000 


14 

0000040E 

5210 




15 

00000410 

1010 




16 

00000412 

OCOO 

003B 



17 

00000416 

6332 




18 





* 6 

19 





* 7 

20 

00000418 

4239 

0000 

EOOO 


21 





* 8 

22 

0000041E 

207C 

0000 

E002 


23 

00000424 

5210 




24 

00000426 

1010 




25 

00000428 

OCOO 

003B 



26 

0000042C 

631C 




27 





* 9 

28 





* 10 

29 

0000042E 

4239 

0000 

E002 


30 





* 11 

31 

00000434 

207C 

0000 

E004 


32 

0000043A 

5210 




33 

0000043C 

1010 




34 

0000043E 

OCOO 

0017 



35 

00000442 

6306 




36 





* 12 

37 





* 13 

38 

00000444 

4239 

0000 

E004 


39 





* 14 

40 





L4: 

41 





* 15 

42 





_L3: 

43 





* 16 

44 





L2: 

45 





* 17 

46 

0000044A 

4CDF 

0303 



47 

0000044E 

4E73 





1n stl0_9 IDNT 

SECTION 9,,C 

XDEF .clock 

MOVEM.L D0/D1/A0/A1,-(SP) 

MOVE.L 20(SP),D0 
unsigned char Seconds,Minutes,Hours; 
#define SINTERRUPT 
void clock(void) 

{ 

if(++Seconds>59) 

MOVE.L #.Seconds,AO 
ADDQ.B #1,(AO) 

MOVE.B (AO),DO 
CMPI.B #59,DO 
BLS.S _L2 

econds=0; 

CLR.B .Seconds 

if(++Minutes>59) 

MOVE.L #.Minutes,AO 
ADDQ.B #1,(AO) 

MOVE.B (AO),DO 
CMPI.B #59,DO 
BLS.S _L3 
{ 

Mn nutes=0; 

CLR.B .Minutes 

if(++Hours>23) 

MOVE.L #.Hours,AO 
ADDQ.B #1,(AO) 

MOVE.B (AO),DO 
CMPI.B #23,DO 
BLS.S _L4 

{ 

Hours=0; 

CLR.B .Hours 

} 

} 

} 

return; 

MOVEM.L (SP)+,D0/D1/A0/A1 
RTE 


48 




SECTION 

14, ,D 

49 




XDEF 

. Seconds 

50 

OOOOEOOO 

00 

.Seconds: 

DCB.B 

1,0 

51 

0000E001 

00 


DCB.B 

1,0 

52 




XDEF 

. Minutes 

53 

0000E002 

00 

.Minutes: 

DCB.B 

1,0 

54 

0000E003 

00 


DCB.B 

1,0 

55 




XDEF 

. Hours 

56 

0000E004 

00 

.Hours: 

DCB.B 

1,0 

57 

0000E005 

00 


DCB.B 

1,0 

58 




END 
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LDD L3_i 

ADDD #1 

STD L3_i 

would be better; however, long 4-byte integers will still be prone to disjoint global 
problems like this. 

C compilers for 8-bit processors normally use absolute memory locations to 
hold floating-point numbers, rather than internal registers, and this non-recursive 
mode makes floating-point arithmetic particularly prone to this problem. Even 
16/32-bit devices, which can handle all sizes of integers in one indivisible ma¬ 
chine instruction, require multiple floating-point operations, unless using a math¬ 
ematical co-processor. Thus, in general it is inadvisable to use floating-point 
global variables which can be altered by interrupt service routines. Similar con¬ 
siderations apply to any global compound-element structure and multiple-byte 
integers for 8-bit MPUs. Of course it is always possible to mask out interrupts 
during sensitive processing. 

Interrupt problems occurring due to disjoint operations are particularly per¬ 
nicious because they appear very rarely and apparently at random. As they are 
not reproducible to order, it is virtually impossible to track them down! 

If global variables have to be shared, the normal advice is to ensure that only 
the highest order of interrupt service function making use of the variable actually 
does the changing. Here the background function is treated as level 0. Thus 
in our Real-Time Clock, the interrupt function clockO is permitted to change 
the global variables Seconds, Mi nutes and Hours, with the background and any 
lower priority interrupts only reading these variables. Higher priority interrupt 
functions should not make any reference to these variables. 

This procedure is not foolproof. Consider a background function turning off 
the central heating pump each morning at 9 am, that is 09:00:00. It turns the 
pump off and on by pulsing a toggling flip flop. It is now 09:59:59. The program 
reads Hours as 09. Getting interested, it is about to read Minutes when an in¬ 
terrupt occurs and alters the time to 1 0:00:00. On return, Mi nutes and Seconds 
are then read as 00:00, and the processor thinks it is 9 am, toggles the flip flop 
and turns the pump on again! This may happen perhaps once a year, but when it 
does, the switching will continue at 180° from the proper sequence. The cure is 
to mask out the interrupt when the time is being read, or to read it several times 
in quick succession — and not to use a toggling flip flop as the pump interface! 


10.3 Initializing Variables 

Targeting C to a ROM-based system presents problems concerning the data por¬ 
tions of the program. This is where variables go when they are stati c and/or 
global (extern). Recall from Section 8.2: 

1. auto variables can be initialized in their definition, but the resulting code is 
identical to a definition subsequently followed by an assignment. As shown in 
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Tablo l8. iV b). these fixed values are moved into memory each time the local area 
in which their scope applies is entered, that is run-time setup. Uninitialized 
variables have an indeterminate value until assigned. 

2. stati c and global variables (stati c or otherwise) can be initialized in their 
definition. The resulting code leads to a compile-time set-up, where the con¬ 
stants are placed in memory by the loader, see Table liOl a). When the program 
starts, it assumes that these values are already in situ, put there by some out¬ 
side agency (the loader). On subsequent executions, any altered variables will 
not regain their original values, unless a load precedes the run. Uninitialized 
stati c/global variables are given an explicit zero value, as for example in 
Table fTTUOl 

3. static or global objects declared const, are placed by the compiler in the 
text area of memory. In an embedded system, this will be in ROM, and is use¬ 
ful for look-up tables and string constants. Such objects are always present, 
with their initial values placed there by the one and only load into the EPROM 
programmer. Table fiO shows an example of this situation. 

Category-2 above constitutes a problem in a ROM-based system, as there is 
no load prior to each run, and therefore RAM-based stati c and global variables 
will not have their pre-initialized values. The state of RAM on power-up is in¬ 
determinate. The obvious way around this is to adapt the style of the C source, 
so that no assumptions are made regarding the initial values of such variables. 
Although most algorithms are amenable to this approach, there are pitfalls to 
trap the unwary. 

The standard problem here is the use of libraries. Since this is code written 
by someone else, you can never quite be sure how initialized data is handled. In 
practice, library routines likely to be used by embedded systems, will probably 
not use pre-initialized stati c/global data, but beware of file and I/O routines. 

Although pre-initialized variables can usually be avoided, the safest approach 
is to use the startup routine to initialize the data program sections in RAM. To 
do this, the compiler must arrange for an image of initialized data to be present 
in ROM (usually following the text area). On startup, this is copied byte by byte 
to the correct place in RAM before going to main(). The actual details of how 
this is done vary considerably from compiler to compiler. To give the reader an 
overview, we will briefly look at three products, the Aztec C68K/ROM V3.30c, the 
Microtec Paragon MCC68K V3.1, and Cosmic/Intermetrics V3.3 compilers. 

The C language specifies that all stati c and/or global variables are assumed 
to be zero unless explicitly pre-initialized in their definition. In an embedded 
system this can simply be implemented by clearing the RAM chip(s) in their en¬ 
tirety at startup. It would be more efficient if only the appropriate number of 
bytes were cleared, although in a small system this startup burden is likely to be 
of little consequence. 

Most compilers place non-zero explicitly pre-initialized and default zero vari¬ 
ables in different but related data program sections. In our compiler examples the 
non-zero pre-initialized variables go into DSEC, Secti on 13 and . PSECT _data. 
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Uninitialized (or explicitly zeroed) stati c/global variables go into BSS, Secti on 14 
and . PSECT _bss respectively (BSS is an archaic expression Block Start Symbol, 
originally used to denote a block of memory common between various programs). 
The three compilers put the program into TSEC, Section 9 and . PSECT _text 
respectively. Table [T0791 shows the Paragon code using Section 9 for program 
code (line 4) and Section 14 for the three uninitialized global variables repre¬ 
sented by the labels .Seconds, .Minutes, . Hours in line 48. Table [T()75l b) shows 
the use of the _bss segment for the uninitialized stati c variable i. 

To assist writing the startup routine, the programmer needs to know, for ex¬ 
ample, how long the two data sections are and where they start. Most linkers 
create certain reserved public symbols giving this information. In the case of our 
three example compilers these are: 

Paragon: 

?RAM_START 
?R0M_START 
?R0M_SIZE 


Where the data section begins. 
Where the image begins in ROM. 
How many bytes the image is. 


Aztec: 

_H0_org 

_Hl_org 

_H2_org 


& _H0_end 

& _Hl_end 

& _H2_end 


Code segment start and end+1. 

Initialized data segment start and end+1. 
Uninitialized data segment start and end+1. 


Cosmic: 

_text_ 

_data_ 

_bss_ 


Code segment end+1. 

Initialized data segment end+1. 
Uninitialized data segment end+1. 


The linker allows the programmer to set the starting address of each section 
separately. If the BSS/Secti on 14/_bss sections are not biased in this way, then 
they normally follow directly on from the corresponding DSEG/Secti on 13/_data 
section. 

Finally how do the compilers produce an image of the pre-initialized data 
in ROM? The Aztec compiler does this automatically with the image following 

on directly from the TSEG portion; that is starting at_H0_end. Its length is 

_Hl_end -_Hl_org. Using this information, a possible startup for this Aztec 

compiler is shown in Table [TOJOl This is written as an extension to Table [TcOl 
but using the Aztec's assembler syntax (standard Motorola). Operation is self- 
evident from the comments; however, note that if a segment does not exist, then 
the org and end labels are made the same by the linker, so a zero difference 
signals non-existence. 

The Paragon product provides two library routines, which help this copy pro¬ 
cess. These are . i nitdata and memcl r(). The former is designed to be called 
directly from the startup routine, for example jsr .initdata, and takes no pa¬ 
rameters directly. The latter is normally used from the C program, requiring a 
pointer to the first byte and an i nt count. 
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The linker must be informed through its command file (see Table l7T0l ) that an 
image of Secti on IB is required, by using the directive ini tdata. Thus in line 19 
of Table [TO.llL we tell the linker to generate an image of Section IB starting 
at 6000b (unfortunately there is no symbol denoting the end of Secti on 9, the 
text). In the startup, j s r .ini tdata will then use the linker-generated symbols 
automatically to do the copying. Line 10 informs the linker that Section 14 
(uninitialized variables) is to follow Section 13. In doing this, RAM can easily 
be cleared from ?RAM_START+?ROM_SIZE upwards. 

In the Cosmic/Intermetrics compilers, the linker is followed by a hexer utility, 
which generates machine code in the requested format for the EPROM program¬ 
mer (see Table [73). Each program section can be shifted to a new start point by 
the hexer; however, as the text remains unaltered, the program still assumes its 
data is at the li nk er's (original) data bias. Thus, to produce an image of the data 
section following on from the text we have: 


Table 10.10 A startup for the Aztec compiler initializing statics/globals. 


* Startup for Aztec 68K v3.30c 

* Copies initial values of statics/globals into RAM 


SSP: 

PC: 


public _H0_org, 

public _Hl_org, 

public _H2_org, 

cseg 

dc.l SAOOO 

dc.l _Start 
ds.l $3F8 


_H0_end 

_Hl_end 

_H2_end 

Code segment 
Initial setting of SSP 
Restore PC value 
Skip up to 0400h 


_Start: movea.l #$1000,aO 

move a0,usp 


; Prepare to set up User Stack Pointer 
; Privileged instruction 


; calculate length of initialized data segment in D0.L 

move.l #_Hl_end,d0 ; End of DSEC, +1 

sub.l #_Hl_org,d0 ; Start of DSEC 

beq ZER0_0UT ; Don't copy if none 

lsr.l #l,d0 ; Convert to word length 

; Now copy initialized data image in ROM in RAM 

movea.l # H0_end,a0 ; Image is at end of TSEC 

movea.l # Hl_org,al ; Point to start of DSEC 

L00P1: move.w (a0)+,(al)+ ; Copy each word 

dbf d0,L00Pl ; DO holding count 


; Calculate length of unitialized data segment in D0.L 

ZER0_0UT: move.l #_H2_end,d0 ; End of BSS, +1 

sub.l #_FI2_org,d0 ; - start of BSS 

beq CONTINUE ; Don't zero out if none 

lsr.l #l,d0 ; Convert to word length 

; Now zero unitialized section in RAM 

movea.l #_FI2_org,a0 ; A0 points to base of BSS 

L00P2: clr.w (a0)+ ; Clear each word 

dbf d0,L00P2 ; Until reaches -1 


CONTINUE: andi #$DFFF,sr 
jsr _main 
bra _Start 
end 


Clear bit 13 to change to user state 
Co to C routine 
Repeat if returns 
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Table LO.ll A typical lod68k command file to produce an image of initialized data in ROM for use 
in the startup code. 






This is a prototype command file for the Microtec linker 
Puts a copy of initialized data in ROM for the startup 


Section 0 is for the entry code, e.g. vector table, in ROM 
Section 9 is for the program in ROM usually 

Section 13 holds initialized local static and global variables, in RAM 
Section 14 is for other vars, e.g. Global and uninitialized statics 
Put initialized static/globals after uninitialized same 


order 13,14 
sect 0=0 
sect 9=0400h 
sect 13=OEOOOh 
absolute 0,9,13 


- Put Section 14 after 13 
* Vector table starting at 00000 
v Program starts at 0400h 
v Any data is at EOOOh up (RAM) 

- Put only these ROM sections in .hex file 
Copy section in ROM at zzzzzh for initialized local static 
data produced in Section 13 in RAM, if relevant 
In entry program subroutine .initdata will copy it back 
always at runtime into RAM 

initdata 13,6000h 

1ist d,s,t,x,c * Public symbols in object module 

*; Local symbol table to object module; Lists it; and public 
symbol table; Produces a cross-reference listing 


load startup 
load fred 

load 68000.1ib\mcc68kab.1ib 
end 


Start up assembler routine 
* Then the compiled user program 
and absolute library 


_ Name of linker program 

_ Relocate data stream into ROM 

|_ beginning at E080 h 

hex09 -db 0xE080 -o fred.hex fred.xeq 

I_ Name of input file from linker 

_ Name of output file 

which says produce the (Intel coded) machine code file with the data bias (-db) 
reset to EO 8 O/ 1 . The output file is called fred. hex and the input (from the linker) 
is fred. xeq. The net result of this process is to create a copy of the initialized 
data in ROM, beginning at EO 8 O /1 but leaving the actual data area unchanged. An 
example is given in Table fTTT. 12( b). 

The Cosmic/Intermetrics 6809 compiler does not produce start_of labels (e.g. 
Text segment start), but including all programs sections in the startup routine, 
as shown in Table [TU. 12 l a), defines these local symbols according to the biases 
set in the linker. Thus if the linker's data bias is 1, then Start_data is 0001 h. 
This routine is si mi lar to that of Table HcTTol but with differing symbols. 

Cosmic/Intermetrics provide a utility top rom with their compilers version 3.32 
and up, to modify their linker output to create this image automatically. The 
starting address in RAM and end address of this image in ROM are also embedded 
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Table 10.12 A startup for the Cosmic compiler, initializing statics/globals and setting up the DPR for 
zero page. 

.processor m6809 


Startup routine for Cosmic 6809 V3.3 supporting zero page 
and copying initial values of statics/globals into RAM 


» 

.external 

main, texl 

Start_data: 

.psect 

data 

Start_bss: 

.psect 

_bss 

Start_zero: 

.psect 

_zpage 


.psect 

_text 

_Start: 

1 ds 

#0800h 

; Now clear 

bss region 


1 dx 

#Start bss 

L00P1: 

cmpx 

# bss 


beq 

INIT_DATA 


cl r 

0, x+ 


bra 

LOO PI 

; Now setup 

data region 

INIT_DATA: 

1 dx 

#Start_data 


1 dy 

#OE080h 

L00P2: 

cmpx 

#_data_ 


beq 

ZER0_PAGE 


1 da 

0, x+ 


sta 

0,y+ 


bra 

L00P2 

; Set up DPR to point 

to zero page 

ZER0_PAGE: 

ldd 

#Start_zero 


tfr 

a,dp 


jsr 

_mai n 


bra 

_Start 


.public 
. end 

_Start 


; Define beginning of data section 
; Define beginning of bss section 
; Define beginning of zero page section 

; Point Stack Pointer to top of RAM 

Point to beginning of BSS 
End yet? 

IF yes THEN move on 

ELSE clear byte and advance pointer 


Point to beginning of data 
Point to beginning of image 
End yet? 

IF yes THEN move on 
ELSE get byte 
and move it 


Start of zero page 

Top byte to Direct Page register 

Co to C code 

Should it return then repeat 
Make this routine known to linker 


(a) Assumes an image of the data lies at E080 h on up. 

:20E0000034463362AE5E8COOOC2F074F5F8E0000200F8E0001EC5E58495849308BEC02AE3A 
:05E020008432C435C08C 



:00E000011F 

(b) Copying the data segment of a modified Table Hm into ROM at EO 8 O /1 upwards. 


into the start of this ROM record, and are used by their provided startup routine. 
This works in the same way as outlined above, but with less hassle. 

As can be seen in Table [T0.12I this compiler supports the use of the 6809's 
direct page (or zero page) address mode (see page [35} as a non-ANSII extension. 
Any static or extern data object can be placed into the assembler's _zpage 
program section by preceding it by the directive @di r. Thus, altering line C3 in 
Table dOA] to: 


@di r static int i; 
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will change the . psect _bss to . psect _zpage and the two following INC com¬ 
mands will use Direc t rather than Extended addressing, as shown in Table ful, 1 'll 
(see also Table [T4~8h All such objects in the file can be placed in the zero page 
by using the ANSII directive #pragma: 

#pragma space[]@dir 

but it must be remembered that a page in the 6809’s space is only 256 bytes long. 


Table 10.13 Zero-page storage with the Cosmic 6809 compiler. 


1 



; Compilateur C 

pour MC6809 

2 




.list 

+ 

3 




.processor m6809 

4 




.psect 

zpage 

5 



L3_i 

: .byte 

[2] 

6 



; 1 

mai n() 


7 



; 2 


{ 

8 




.psect 

_text 

9 



; 3 


@di r static 

10 



1 4 


whi1e(l) 

11 

E000 

0C01 

LI: 

i nc 

L3_i+1 

12 

E002 

2602 


jbne 

L4 

13 

E004 

ocoo 


i nc 

L 3_i 

14 

E006 

20F8 

L4: 

jbr 

LI 

15 



; 5 


} 

16 




.pub!ic _main 

17 




. end 



{!++;} 


The bias for this page can be set in the linker; for instance: 
ln09 +zpage -b0x8000 

sets it to 8000 h. This will be the value of Start_zero in Table fT0.12l Bringing 
this down to Accumulator_D and then doing a TFR A, DP sets up the Direct Page 
register to the upper address byte (80h in this example) as required. 

I have assumed in Table !! O.TTT a) that the initial state of the zero page does not 
matter. If it does, then all 256 bytes can be cleared or an image copied from ROM. 


10.4 Portability 

To the microprocessor engineer, portability is one of the major attractions of 
a high-level language. Thus a company upgrading a 6502-based product line 
to, say, the 68000 family, can continue to use the bulk of the original software, 
without a substantial change. In reality, the migration of software between dif¬ 
fering systems, at the lowest to the highest level, is fraught with difficulties to 
the unwary [6). 

As an example of low-level problems that can occur, most of the newer fa mi lies 
of MPU are software downwards compatible. Thus the 80386 MPU has an 8086 
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emulation mode and the 68020 MPU is object code compatible to the 68000. Con¬ 
sider the CLR <memory> instruction in the 68000/8 MPU. This is implemented as 
a classical read-modify-write operation, although the data read is irrelevant (see 
page!25i. This means that the address of <memory> is put out on the address bus 
twice. A devious hardware engineer may deliberately make use of the resulting 
double address decoder pulse, by using CLR, say, to increment a counter twice. 
At some time later, probably after this ingenious engineer has left, the company 
decides to upgrade to a 68020-based microcomputer. They have been assured 
the 68000 code will directly run under 68020 control. So it does, or does it? Mo¬ 
torola have speeded up CLR on the 68020 MPU and subsequent family members, 
by dispensing with the initial useless Read cycle, ergo a counter incrementing at 
half its proper rate! Abstruse bugs like this are difficult and very expensive to 
unearth, but abound where software is migrated between systems. 

At the higher level, one solution to the portability problem is to define a 
virtual machine (i.e. having a hypothetical structure) together with a UNiversal 
Computer-Oriented Language (UNCOL) m. Each physical machine would have a 
translator from UNCOL to its particular machine code. With such a scheme, a 
high-level language would only require the one machine-independent compiler 
to UNCOL. 

Unfortunately no UNCOL exists in practice, although several half-hearted at¬ 
tempts towards this goal have been made. At one time, A-natural ® was in vogue 
as a kind of standard assembly language, but its close relationship to the 808x- 
MPU family led to its eventual demise. Some software engineers consider C as 
an UNCOL. Certainly its origins as a high-level assembly language used to port 
the operating system UNIX to various hardware hosts El would seem to fit it into 
that role. Amongst its other virtues, the relative lack of dialects, now enforced by 
the ANSII standard, makes C one of the most portable of the higher languages. 
But even here, 100% portability is a pipe-dream, and the term transportable is a 
more apt description. 

Considerations of portability depend on the type and scope of the software. 
This can roughly be categorized as follows: 

1: Operating System independent 

A program in this category will run in the same way, irrespective of its cocoon¬ 
ing operating system. Thus, for example, the program given in Table fT07T4l a) 
should execute equally well on a Hewlett Packard Apollo work station (680x0- 
based) under UNIX and on an IBM PC (80x86-based) under MSDOS. 

2: Operating System specific 

Programs which take advantage of special features of some operating system, 
and can therefore only run on hardware supporting that operating system. 

3: System and machine specific 

A further restriction on category-2, but also relies on a specific hardware fea¬ 
ture. Hardly portable at all! 

4: Unhosted 

Typical of embedded microprocessor circuits. Cannot rely on ANSII-standard 
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I/O functions. Both super portable and not portable at all! 

Old C had only a de facto standard library, as defined by Kernighan and 
Ritchie flOl . Compiler writers were free to provide any library functions they 
felt like. This of course made porting software a nightmare, unless the same 
compiler was available across the target range. The ANSII standard now provides 
an essential core of standardized library functions, which must be available no 
matter what the eventual target is m Thus, in principle, using C compilers 
conforming to this standard should make category-1 portability easy to achieve. 

Two of these standard functions are used in Table QjTTTJ scanf () is a format¬ 
ted Read function, taking input from the standard input channel stdi n (usually 
the keyboard), according to a list of format tokens ITCH . Thus: 

scanf("%u",&n); 

means go to stdi n and get an unsigned decimal integer (%u), which will be put 
away at the address of n (i.e. assigned to n). Other formats tokens are %d, %1 d, 
%x, %f etc., for Decimal integer, Long Decimal integer, hexadecimal integer and 
decimal Floating-point. 

printfO is the formatted write to standard output function counterpart 
(stdout is normally the VDU screen or printer), which sends messages with vari¬ 
able values replacing embedded format tokens |T2]. Thus: 

printf("The sum of all integers up to %u is : %1u\n",num,sum); 

prints the message in quotes, with the format token %u replaced by the decimal 
value of num at that point in the program, and the long decimal value of sum 
likewise. Notice the use of \n to give a new line. Tabic ITT). 1 41 b) shows a run-time 
example. 


Table 10.14 A portable C program using ANSII library I/O routines. 
#include <stdio.h> 
mai n() 

{ 

unsigned short num,i; 
unsigned long sum; 
printf("Enter number \n"); 
scanf("%d", &num); 
for(sum=0,i=num; i>1; i--) 

{sum += i;} 

printf("The sum of all integers up to %d is : %ld\n", num,sum); 
} 

(a) C source code. 

Enter number 
35 

The sum of all integers up to 35 is : 629 


(b) Typical run. 
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Compilers come with a set of header files, giving amongst other things, proto¬ 
types of all the library functions. Some of these are <stdi o. h> for the standard 
input/output functions and <stdl i b. h> for utility functions. Table [9717)1 shows 
a typical <math . h> mathematics function header. 

Unfortunately, even with the ANSII standard, many details are left as imple¬ 
mentation dependent. For example, the size of ints (typically 16 or 32 bits), 
whether an unqualified char is signed or unsigned, the direction of truncation 
for / (divide) and the sign of the result for % (remainder) are machine-dependent 
for negative operands. File handling, for example rules for naming, and various 
system-related constants, such as the End of File constant (EOF is usually -1), are 
operating-system specific. 

Most implementation and operating-system foibles tend to be obscure and 
difficult to track down. As a simple example of the former, consider the code 
fragment: 

i nt i ; 

for (i=0; i<32768; i++) {do this;} 

This will work perfectly well in an implementation which maps i nt on to a 32-bit 
word, but this will be done forever on a 16-bit implementation with its largest 
value of+32767 (7FFFh). 

To reduce the possibility of this kind of problem, system-dependent variables 
should be gathered together into a header file, which can easily be altered if the 
software is transposed. Also standard types can be defined. Thus: 

SIGNED_32 i; 

for (i=0; i<32768, i++) {do this;} 

where the header contains the typedef 

typedef long SIGNED_32 

for a 16-bit implementation and 

typedef int SIGNED_32 

for a 32-bit int size. 

Programs generated by a compiler with extended libraries and/or using spe¬ 
cial operating-system specific features, are category-2 portable. A considerable 
rewrite will be necessary to port such software to different machines, especially in 
the latter situation. Severe problems arise where some special host hardware fea¬ 
ture is utilized. Graphics-oriented software frequently comes into this category-3, 
and the concept of portability to another operating system/host is then virtually 
meaningless. 

Porting embedded microprocessor C code presents the engineer with its own 
set of particular problems. Provided that the source does not make use of non- 
ANSII features, the bulk of raw code will translate to any target. C compilers 
are available for the majority of CPUs, from mainframe down to microcontroller. 
Some examples resulting from our sum-of-integers source are presented without 
comment in Table [T07T51 






PORTABILITY 301 


Table ful. 1 51 Compiling the same source with a spectrum of CPUs (continued next page). 


: ts=8 


:ts=8 


; mai n(n) 


; mai n(n) 


;unsigned char 

n; 

;unsigned char n; 

public 

mai n 


public 

mai n_ 

_main: link 

a6,#.2 

mai n_ 

jsr 

.csav# 

movem.1 

■3,-Csp) 


fcb 

.3 

;{ 



fdb 

.2 

;static unsigned short sum; 




bss 

.4,2 

;static unsigned short 

;for(sum=0;n>0 

n-) 


bss 

.4,2 

cl r.w 

.4 

; for 

(sum=0; 

n>0;n--) 

bra 

.8 


stx 

.4 

; {sum+=n;} 



stx 

.4+1 

.7 move.1 

#0, dO 


jmp 

.6 

move.b 

ll(a6),dO 

.5 

cl c 


add .w 

dO, .4 


1 da 

#255 

.5 sub.b 

#1,ll(a6) 


1 dy 

#11 

.8 tst.b 

ll(a6) 


adc 

(4) , Y 

bhi 

.7 


sta 

(4) , Y 

;return(sum); 



1 da 

#255 

.6 move.1 

#0, dO 


adc 

#0 

move.w 

.4, dO 

.6 

1 dy 

#11 

.9 movem.1 

(sp) + , .3 


1 da 

(4) , Y 

uni k 

a6 


sta 

24 

rts 



stx 

25 

;} 



txa 


. 2 equ 

0 


cmp 

24 

. 3 reg 



sbc 

25 

dseg 



jcs 

.7 

end 


» 

{sum+=n 

;} 




1 da 

(4) , Y 

(a) Aztec 68000 MPU V3.30c. 


sta 

24 




stx 

25 

; mai n(n) 



cl c 


junsigned char 

n; 


1 da 

.4 

_main: push 

b P 


adc 

24 

mov 

bp, sp 


sta 

.4 

mov 

cx,word ptr 4[bp] 


1 da 

.4+1 




adc 

25 

;static unsigned short sum; 


sta 

.4+1 

;for (sum=0;n>0;n--) 


jmp 

.5 

mov 

word ptr [026c],0 

;return(sum); 

jmp 

L2 

.7 

1 da 

.4 

; {sum+=n;} 



sta 

8 

LI: add 

word ptr [026c],c 


1 da 

,4+lx 

dec 

cx 


sta 

9 

L2: or 

cx, cx 


rts 


jne 

LI 

;} 



;return(sum); 


.2 

equ 

0 

mov 

ax,word ptr [026c] 

.3 

equ 

0 

;} 



public 

.begin 

pop 

bp 


dseg 


ret 



cseg 


.public _ 

_mai n 


end 



end 


(c) Aztec 6502 MPU V3.20c. 


(b) Zortech 8086 MPU V3.0 with debugger VI.02. 
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Table lTC). I 51 Compiling the same source with a spectrum of CPUs (continued next page). 



NAME 

summ(18) 


NLIST 

D 


RSEC 

CODE(O) 


LIST 

E, L 


RSEG 

UDATA(O) ; 

Version 

1.5 Compiler 860818 P Code Gen 


PUBLIC 

main (150,191) ; 

Source: 

summ Prog: 

Date: 25-MAY-1989 12 


EXTERN 

7CL6801 1 15 L07 


NAME 

summ 


RSEG 

CODE 


DSEG 


* 1. 

mat n(n) 



DEFS 

00002H 


P6801 



EXTPUB 

GLOB. 

* 2. 

unsigned char n; 

GL0B_: 

STKLN 

100H 

* 3. 

{ 



EXTRN 

ENTZ2. 

mai n: 

PSHB 



XSEG 



PSHA 


CONST. 

: CSEG 


* 4. 

static 

unsigned short sum; 

M_summ 

: IP 

ENTZ2. 

* 5. 

for (sum=0;n>0;n--) 

; l. 

mai n(n) 



CLRB 



EXTPUB 

MAIN. 


CLRA 


; 2. 

unsigned char n; 


STD 

70000 

; 3. 

{ 


70002 

TSX 


; 4. 

static unsigned short sum; 


LDAB 

1,X 

; 5. 

for (sum=0; 

n>0;n—) 


PSHB 


; 6. 

{sum+=n; 

} 


CLRB 


; 7. 

return(sum) 

1 


TSX 


MAIN : 

PUSH 

HL 


SUBB 

0,X 

; line 3 



INS 


; line 5 



BCC 

70001 


XOR 

A 

* 6. 

{sum+=n ;} 


LD 

(GLOB +0FFFFH),A 

70003 

TSX 


Q_2: 

LD 

HL,00000H 


LDAB 

1,X 


ADD 

HL, SP 


CLRA 



LD 

A, (HL) 


PSHB 



AND 

A 


PSHA 



HR 

z,Q_l ; 


LDD 

#70000 


1R 

0-3 ; 


PULX 


Q_4: 

LD 

HL,OOOOOH 


PSHB 



ADD 

HL, SP 


PSHA 



LD 

A, (HL) 


PSHX 



DEC 

A 


PULA 



LD 

(HL) , A 


PULB 



3R 

0-2 ; 


PULX 


Q_3: ; 

line 6 



ADDD 

0,X 


LD 

HL,OOOOOH 


STD 

0,X 


ADD 

HL, SP 


TSX 



LD 

C , (HL) 


PSHB 



LD 

A,(GLOB +0FFFFH) 


PSHA 



ADD 

A,C 


PSHX 



LD 

(GLOB +0FFFFH),A 


PULA 



3R 

0-4 ; 


PULB 


0-1: ; 

line 7 



PULX 



LD 

A,(GLOB +0FFFFH) 


ADDD 

#1 


LD 

L, A 


PSHB 



LD 

H,OOOOOH 


PSHA 



POP 

DE 


PSHX 



RET 



PULA 


1 8. 

} 



PULB 



DSEG 



PULX 



0RG 

GLOB. 


LDAB 

o,x 


XSEG 



DEC 

o,x 


0RG 

CONST. 

* 7. 

return(sum); 


END 



BRA 

70002 

; Code 

Bytes: 49 

(4) Constant Bytes: 0 

70001 

LDD 

70000 

; Data 

Bytes: 2 


* 8. 

} 


; Constant Bytes: 

0 

70005 

PULX 






RTS 


(e) Microtec Z80 

MPU VI.5. 


RSEG 

UDATA 




70000 

RMB 

2 





END 


(d) IAR 6801 MCU V1.15/MD2. 
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Table 10.15 (continued! Compiling the same source with a spectrum of CPUs. 


Transputer DECODE (VI.2) of t_sum. 

. bi n 

1 

smain(n) 


ID T8 "Occam 2 V2.1" 



2 

unsigned char n 

; 

"CC_transputer V2.0" 




smain: .entry 

smain,Am,r2,r3 

SC 0 





subl 2 

#4,sp 

T0TALC0DE 148 0 





movab 

JDATA,r3 

STATIC 2 




3 

{ 


1 main(n) 




4 

static unsigned 

short sum; 





5 

for(sum=0;n>0;n 

-) 

CODESYMB "main" 

00000030 




cl rw 

Cr3) 


71 00030 

ldl 

1 


moval 

4(ap),r2 


30 00031 

ldnl 

0 


movzbl 

(r2),rO 

20 

20 00032 

ldnl 

MODNUM 


beql 

sym.2 

BF 

60 00034 

ajw 

-1 


nop 



DO 00036 

stl 

0 

6 

{sum+=n;} 



2 

3 

4 

5 


6 


7 


8 


unsigned char n; 

{ 

static unsigned short sum; 


sym.l: movzwl (r3),rl 

movzbl (r2),r0 
add!2 rl,r0 


for(sum= 

0;n>0;n 

-) 



cvtlw r0,(r3) 

40 

00037 

ldc 

0 


decb (r2) 

70 

00038 

ldl 

0 


movzbl (r2),r0 

El 

00039 

stnl 

1 


bneq sym.l 

13 

0003A 

ldlp 

3 

7 return(sum); 

FI 

0003B 

lb 


sym.2 

: movzwl (r3),r0 

40 

0003C 

ldc 

0 


ret 

F9 

0003D 

gt 


8 } 


A0 21 

0003E 

ci 00050 



{sum+ 

=n;} 



(h) DEC 

V.2.3-024 VAX 750 minicomputei 

70 

00040 

ldl 

0 


NAME test(16) 

31 

00041 

ldnl 

1 


RSEC CODE(O) 

13 

00042 

ldlp 

3 


RSEG DATA(O) 

FI 

00043 

lb 



PUBLIC main 

F5 

00044 

add 



EXTERN 7CL6811 3 00 L07 

70 

00045 

ldl 

0 


RSEC CODE 

El 

00046 

stnl 

1 


P68H11 

13 

00047 

ldlp 

3 

1 

main(n) 

FI 

00048 

lb 


2 

unsigned char n; 

41 

00049 

ldc 

1 

3 

{ 

F4 

0004A 

diff 


mai n: 

PSHB 

13 

0004B 

ldlp 

3 


PSHA 

FB 23 

0004C 

sb 


4 

static unsigned short 

OA 61 

0004E 

i 0003A 

5 

for(sum=0;n>0;n—) 

return(sum); 




CLRB 

70 

00050 

ldl 

0 


CLRA 

31 

00051 

ldnl 

1 


STD 70000 

Bl 

00052 

ajw 

1 

70002 

: TSX 

FO 22 

00053 

ret 



LDAB 1,X 


} CMPB #0 

BLS 70001 


(f) Parallel C INM0S Transputer T825 V2.0. 


6 {sum+=n;} 


70003: TSX 


; Compilateur C 

pour MC68HC16 (COSMIC-France) 

LDAB 1,X 


.psect 

_bss 

CLRA 


. even 


ADDD 70000 

L3_sum: 

. byte 

[2] 

STD 70000 


.psect 

_text 

DEC 1,X 


. even 


7 return(sum); 

_main: 

pshm 

x,d 

BRA 70002 


tsx 


70001: LDD 70000 


. set 

OFST=0 

8 } 


cl rw 

L3_sum 

PULX 

LI: 


; line 5, offset 7 

RTS 


ldab 

OFST+3,x 

RSEC DATA 


beq 

Lll 

70000: FCB 0,0 


cl ra 


END 


addd 

L3_sum 



std 

L3 sum 

(i) IAR 6811 MCU V3.00E. 


dec 

OFST+3,x 



bra 

LI 


Lll: 


; line 6, offset 27 



ldd 

L3 sum 



ldx 

0, x 



ai s 

#4 



rts 




.public 

_mai n 



. end 

(g) COSMIC/Intermetrics V.3.32 6816 MCU. 
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Most remarks made previously also apply to this category-4 portability. In 
particular the hardware-oriented nature of free-standing systems leads to code 
which makes assumptions concerning the structure in memory of data. For ex¬ 
ample, byte ordering in some processors places the most significant bits of a 
word in the lower byte address (the so called Little-Endians); others do the oppo¬ 
site (the Big-Endians). Thus breaking up a 16-bit i nt word into two chars named 
bytel and byte2 in this manner: 

bytel = word/256; byte2 = word; and bytel = word>>8 ; byte2 = 
word; 

will only be equivalent for the latter case. Particular problems arise in recon¬ 
ciling targets with segmented address spaces and special I/O instructions (e.g. 
the 80x86 family) to code targeted to processors with a linear address space and 
memory-mapped I/O (e.g. the 680x0 family). 

In practice, most portability problems occur in handling I/O and files. Op¬ 
erating systems are designed to act as an insulating layer between applications 
programs (software that you write) and such considerations. Most small and 
medium-sized embedded systems are self-standing, or at most a ROM-based mon¬ 
itor may be resident. 

Without this decoupling, it is likely that the designer will have to write the 
startup/support code and library routines to handle, where applicable, inter¬ 
rupts, fault response, memory management and device protocol. A good deal 
of this is processor dependent, and so must be coded at assembler level, which 
by definition is non-portable. 

A larger embedded system may be able to support the overhead of a resident 
commercial operating system. The majority of standard operating systems are 
not suitable for this category of system, supposing as they do a fairly standard 
computer environment. More relevant real-time systems software can be pur¬ 
chased, but hardly add to the portability score. Sometimes a single-board com¬ 
puter may be available which mimics a standard computer architecture, such as 
an IBM PC. This can then be used in certain circumstances with a standard oper¬ 
ating system, such as MSDOS. A ROM version of the system software is available 
where a magnetic disk bulk storage unit is not required. 

An embedded configuration is characterized by a rich variety of I/O devices, 
such as lamps, 7-segment and alphanumeric LCD displays, switches, keypads, 
analog to digital converters and many more exotic examples. Using standard I/O 
library routines, such as in Table [TcTmI is hardly practicable in these situations. 
Instead special device drivers must be developed. These can be written in C, but 
care must be taken, as machine and architectural considerations intrude at this 
level. Standard ANSII and other library routines which do not access I/O can be 
utilized in the normal way. 

Where peripherals resembling standard computer terminals will be attached 
to the system, then the ANSII I/O routines can be used in the usual way. These 
routines, such as pri ntf () and scanf () as well as file input/output make use of 
the base routines putchar () and getchar (). Thus if putcharQ and getchar Q 
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Table 10.16 Tailoring the ANSII I/O functions to suit an embedded target. 
int putcharfunsigned char c) 

{ 


_asm("cl r. 1 
_asm("move.b 
_asm("jsr 
_asm("cl r. 1 
_asm("move.b 
} 


dO\n") ; 

7(sp),dO 
0x7FlE 
d7\n") ; 
dO, d7 


* Get c out of stack widened to int \n") ; 

* OUTCH \n") ; 

* Return(c); \n"); 


(a) The putcharO function in maximot. h. 

int getchar(void) 

{ 

_asm("clr.1 d7\n"); 

_asm("jsr 0x7F00 * INCH \n"); 

_asm("move.b d0,d7 * Return it \n"); 

} 

(b) The getcharO function in maximot.h. 

#include <maximot.h> 

#include <stdio.h> 
mai n() 

{ 

printf("Hello world"); 

} 


(c) A silly mai nQ function to print out a string. 


L5: 


_man n: 

- 5 


#i ncl ude 

<maximot.h> 

#i ncl ude 

<stdio.h> 

. byte 

72,101,1 

. byte 

114,108, 

mai n() 


{ 


. even 


link 

a6,#-4 

printf(" 

Hello world") 

move. 1 

#L5,(sp) 

jsr 

_pri ntf 

uni k 

a6 

rts 


.globi 

_mai n 

.globi 

_pri ntf 


This is the string "Hello world",0 


Open a frame to send a pointer to the string 

Push out the pointer to the string 

Goto the printf() function declared in <stdio.h> 

Close down the frame 

and return to the Startup routine 


(d) The resulting source code, with printfQ extracted from the library. 


are written to suit the target hardware, then the higher-order input/output library 
routines can be used in the normal way. 

As an example, consider a self-standing circuit based on an embedded 68000- 
MPU. This system runs under an operating system monitor which communicates 
with a terminal through a bidirectional serial link through a UART. The monitor 



306 C FOR THE MICROPROCESSOR ENGINEER 


has two subroutines to send and receive single characters along this link. Sub¬ 
routine OUTCH is located at 7F1 E h and sends out one 8-bit character located in the 
bottom byte of DO. Subroutine INCH at 7F00h waits until a character is received 
and returns with it in the lower byte of DO. 

The definition of the C function putchar () is: 

• Accepts an unsigned character as its single parameter 

• Returns this character as an i nt 

giving the declaration unsi gned char putchar () (i nt c), where c is the char¬ 
acter to be sent out. The definition of this function is given in Table fTTf.lbf a). 
and simply extracts c from the System stack (seven bytes up from SP), jumps 
to subroutine OUTCH and then widens and copies it to D7.L, the normal return 
register for this compiler (Cosmic V3.32). 

The definition of getcharO is: 

• Does not take any input parameter 

• Returns the received character, widened to an i nt 

• If there is a problem getting this character, then a special End Of File (EOF) is 
returned. In this compiler EOF is -1 (FFFFFFFFh) 

giving the declaration int getchar(void). The definition of this function is 
given in Table 110.161 b). As the monitor function INCH does not return an er¬ 
ror condition, the EOF protocol is not implemented (a more sophisticated INCH 
subroutine would detect problems such as parity violation or overrun). 

Finally to illustrate the concept, a main function printing out a simple string is 
shown in Table [I().16[ c). The two tailored functions are included as a header file 
<maxi mot. h>; alternatively they could be incorporated in a library. The library 
function pri ntf () is declared in the ANSII standard header file <stdi o. h> (for 
STanDard Input/Output). The actual use of pri ntf () is commented in the listing, 
and is straightforward in this simple example. The actual machine code produced 
by this example (with an integral-only version of pri ntf ()) was 2950 bytes. Al¬ 
though this may seem extravagant, pri ntf () is an extremely versatile and flex¬ 
ible function. If all we required of pri ntf () was to output fixed strings, the 
ANSII library function puts() (for PUT String) would give a much more econom¬ 
ical solution. Similarly gets() is a more limited input library function. A string 
in C is defined as an array of character codes terminated with 00 h (see line 5 
in Table H0.161 d)). Of course both gets() and puts() use the base functions 
getcharf) and putcharQ. 
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PART III 


Project in C 


In this part we follow through an embedded microprocessor-based product 
from inception through hardware and software design to the testing and debug¬ 
ging of a functional prototype. Both C and assembly-level software implementa¬ 
tions are considered to compare the two techniques. In a similar vein a 6809 and 
68008-based system are designed in order both to emphasize the portability of a 
high-level language and to illustrate the feasibility of C as the language of choice 
in an 8-bit target, as well as the more traditional 16/32 bit product. 

As well as an exercise in programming in C, the project is used as a vehicle 
to examine some of the products that are available to aid in the investigation of 
software veracity and also the interaction between hardware and software. 



CHAPTER 1 1 


Preliminaries 


Trend monitoring is a common instrumentation requirement. The aneroid record¬ 
ing barometer, providing hard copy of typically a month's atmospheric pressure, 
is an everyday example. The techniques used to acquire and display such data in 
any particular situation depend on the signal characteristics. 

Short-duration non-repetitive events are typically captured and displayed on 
a storage oscilloscope. Until relatively recently, electrostatic storage cathode ray 
tubes (CRTs) were used to combine both memory and display functions. Most 
current equipment uses semiconductor RAM for storage, in conjunction with a 
CRT display. Digital storage oscilloscopes have the advantage that acquired data 
can be subsequently read out to a computer for analysis and onto an X-T recorder 
for hard copy. Single-shot events lasting from 10 8 to 10 8 seconds can readily 
be accommodated. 

Repetitive events require a slightly different approach. Normally it is neces¬ 
sary to view the signal for the last few cycles. For events lasting from several 
seconds upwards, a chart recorder gives both storage and display; for example 
the aneroid barometer. The maximum pen writing speed (slew rate) of typically 
500 cm/s places a lower limit on the event cycle duration. Even where the writing 
speed is adequate, fast events can lead to records physically occupying a great 
deal of space. 

Where fast sub-millisecond repetition cycle signals are to be observed, the 
oscilloscope is the instrument of choice. The timebase is triggered at some unique 
point on the waveform and adjusted to display the duration of interest. Although 
the cycle time can be very short indeed, any changes must be long term in order 
to be observed. 

The standard oscilloscope display relies on the persistence of human vision, 
which sees' images repeated at a rate of 50 per second or more as flicker free; 
the principle of television and the cinema. Slow timebase rates (where the scan 
duration is measured in seconds) give a moving-spot display, as the luminance 
lifetime of standard CRT phosphor is typically 10 ms. Long-persistence phos¬ 
phors are available, and the classic example of an application of this technique is 
radar. Figure fTTTTl shows a simulated trace of the electrical activity of the human 
heart using a long-persistence CRT. The no mi nal period of this electrocardiogram 
(ECG or EKG) signal is between 20 and 180 beats/minute. Using a timebase of 
200ms/cm, gives a potential 2-second record length. 

The main problem here, besides the variable brightness trace, is the fixed 
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Figure 11.1 A typical long-persistence display. 

nature of the phosphor's luminance lifetime; thus the CRT must be selected with 
the application in mind. A digital solution, in conjunction with a standard CRT, 
provides a much more flexible solution. Using a MPU to control the acquisition, 
storage and display of the data, means that additional features, such as freeze, 
back spacing and signal processing, can also be accomplished. Furthermore, once 
the data is in situ it can be used in ways not related to the display function. 

In essence, we need to continuously sample the signal at a suitably slow rate, 
while concurrently scanning and displaying several seconds' worth of past data at 
a faster rate suitable for the human eye. A typical sequence, showing the resulting 
scrolling trace, is shown in Fig. 111.21 This diagram shows file snapshots taken 
at \ window intervals. A window here is defined as the time past shown on the 
display. The most historical data is shown to the left of the display, and this 
results in the trace scrolling to the left as new data is acquired. In implementing 
this process for our project, we will have created a time-compressed memory. 
Unlike Fig. lll.il this technique does not rely on the phosphor luminance lifetime. 
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11.1 Specification 

The customer specification is the rock on which the enterprise is built. As such, 
it should be treated with the same respect afforded to the foundation of any 
building. 

The product request will normally originate either from the customer, or as 
a projected need from marketing personnel. Unless the objective is the exact 
replacement of a product already on the market, for example a central heating 
controller, such a request is likely to be couched in the language of the application 
rather than in technical terms. There will be obvious boundary constraints of both 
a financial and technical nature, but other concerns may well involve complying, 
say, with legal rulings, such as medical safety requirements. 

In essence, the design team must tease as much information as possible from 
the originator; take away the request and return with a set of proposals. This will 
involve consideration of the following questions: 

• What is it to do? 

• Is it possible? 

• How is it to be done? 

• Can the request be modified? 

The outcome of these deliberations is communicated to the customer, and 
after several iterations a concrete specification will emerge, provided that the 
project is thought viable. It is important that the specification be decided at this 
point, not least to avoid the phenomena of creeping featurism'. The document 
will be used as the basis of a suitably detailed implementation, culminating in a 
working prototype. It may even be used as a legal document, should litigation 
occur! 

With this discussion in mind, let us begin the process with the specification 
on which our project is based. The customer has asked us to construct a portable 
ECG/EKG monitor with the following outline specification: 

1: Input 

Three-lead ECG/EKG signal with integral amplifier having a bandwidth of 0.14 Hz 
to 50 Hz. 

2: Output 

100 mm (4") width standard CRT, displaying a nominal two seconds worth of 
data. 

3: Data accuracy 
±0.5% of full scale. 

4: Display resolution 
Better than 0.5 mm. 

5: Facilities 

Freeze on demand. Sampling variation of -50% to + 100% around no mi nal. 
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A prototype circuit is to be built, to demonstrate the feasibility of the pro¬ 
posal and to win customer approval. It will allow simple field trials to be under¬ 
taken. The prototype will use commercial power supplies and an oscilloscope as 
a display. These standard components can be bought in or designed in-house 
according to production design considerations. 

If we make an initial decision to choose a MPU-based implementation as our 
starting point (see also Section 11.2), then we can do a primary feasibility study 
on paper. 

With an upper frequency of 50 Hz, Shannon's sampling theorem tells us that a 
minimum of 100 samples will be required per second, say, 128 as around (binary) 
number. For a 2-second record length, this requires 256 data points. 

At 128 samples per second, the sample period is 7.8 ms. As a first strategy, 
we could have one complete scan across the CRT in this time, and thus a new 
sample would be taken each screen scan (probably during flyback). Allowing 
lms for flyback, the 256 samples would be displayed in 6.8ms; giving 26.6 ps 
per dot on the screen. In this time, the microprocessor would need to increment 
the X co-ordinate, get the next sample from the array, and send out the X and Y 
co-ordinates. This is probably getting close to the limit of what a general-purpose 
MPU is capable of, especially if a simpler microcomputer unit (see Section 11.2) 
is chosen. As the scanning rate here is 128 per second, we can afford to consider 
two new samples during each full screen scan. At 64 scans per second, this is 
still well above the flicker rate, but the trace will jerk two dots left after each scan 
(see Fig. 111. 21 ). We now have 7.8 x 2 = 15.4ms (two sample periods) for the scan 
plus flyback; that is 14.4 ms per scan. Dividing by 256 gives 56 ps per screen dot, 
which is a satisfactory compromise. 

In summary, for a time-compressed memory complying with the customer 
specification, as shown in Fig. lll.3l we have: 

1. Sampling rate 128 samples/second 

2. Memory capacity 256 samples (for a 2-second frame) 

3. Scan rate 64 frames/second 


4. Flyback time 1 ms no mi nal 

5. Time between steps 56 ps, at two samples per scan 

The average adult has a resting heartrate of 72 beats per minute (0.83 Hz), with 
a variation between 40 and 180 beats per minute over all conditions. Although the 
frequency range is essentially contained in the range 0.14 Hz to 50 Hz (l], most of 
the energy lies below 20 Hz. Thus the 128 Hz sampling rate will give at least six 
samples per cycle, which is just adequate for reasonable visual representation. 
Increasing the sampling rate to, say, 512 Hz, would require a 1024-word data 
store and consequently a 10-bit digital to analog converter. Furthermore, eight 
samples per scan would be needed to keep the dot rate on the screen the same. 
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Remember that the dot rate is the time used by the MPU to get and send out the 
new X and Y values to the CRT amplifiers. 

Of course designing a prototype and subsequent modifications is only the 
beginning of the process. Setting up a production line is expensive, and the al¬ 
ternative of subcontracting all or part of this activity is one of the major design 
decisions that will be taken at this point. With the assumption of in-house man¬ 
ufacture, which is only really feasible for large scale production, the next stage is 
the construction of several preproduction prototypes. In making a few units, as 
if for sale, the production team will be verifying that the system can be economi¬ 
cally built on an assembly line. Electronic devices are relatively standard, but me¬ 
chanical components, such as printed-circuit boards, switches, connectors, case 
and artwork are somewhat variable. Decisions must be made regarding methods 
of construction, second-sourcing of components, stock levels and even down to 
whether to use surface mount or sockets for the integrated circuits. Just as im¬ 
portant, but often overlooked, is how and when to test components, subsystems 
and the final product. 

The production literature covers assembly details and wiring patterns. In 
some cases programs for computer-aided manufacture (CAM) facilities will be 
covered under this heading. Included in this category is the testing documenta¬ 
tion. This may be either a tester's manual or software for automatic test equip¬ 
ment (ATE). 

Post-production documentation covers service manuals and of course the user's 
handbook. The quality of this material will often add considerably to the cus¬ 
tomer's satisfaction, which hopefully will eventually increase the reputation of 
the manufacturer and eventually increase sales. 


11.2 System Design 

As shown in Fig. 1 11.41 there are several critical steps between agreeing a specifi¬ 
cation and actually getting down to the minutia of hardware/software design f21 . 
One global decision involves the selection of the system transducers, since these 
form the interface between the electronics and the real world. These will be cho¬ 
sen on the basis of an analysis of the parameters involved, together with their 
measurement and interconversion to an analogous electrical quality. In our case, 
standard ECG/EKG configured pads (see Fig. 111.41 sense the bioelectrical poten¬ 
tials and a 100 mm CRT acts as the output device. 

The choice of transducer is not unduly influenced by the technology which 
will be used for the central processing electronics. However, their selection at 
this time, coupled with a system task analysis, will permit a speculative block 
diagram to be made of the system. This system formulation is shown for our 
specific project in Eig. ll 1.41 

At this point some thought can be given to the technology of the central elec¬ 
tronics. At the functional level, this will involve a partition between the analog 
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Figure 11.3 Block diagram of the electrocardiograph time compressed memory. 
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and digital processes. For example, should an input signal be filtered before the 
A/D conversion (analog filter) or after (a digital software filter)? 

At the digital end of things, the choice essentially lies between random logic 
(hard-wired digital combinational and sequential circuitry) and programmable 
logic (microprocessor-based software-directed hardware). Conventional logic is 
often best for small systems with few functions, which are unlikely to require ex¬ 
pansion. Indeed the present project was based on a random-logic time-compressed 
memory predecessor. In larger mass-produced products, this type of logic may 
appear in the guise of programmable arrays, semi and fully custom-designed in¬ 
tegrated circuits. 

Microprocessors work sequentially doing one thing at a time, while random 
logic can process in parallel. Thus, where nanosecond speed is important, con¬ 
ventional logic is indicated (but note that analog electronics is even faster). It is 
possible to run many microprocessor chips in parallel, the transputer being the 
seminal example. The conventional approach uses mixed logic with a micropro¬ 
cessor in a supervisory role controlling the action of supporting random logic 
and analog circuitry. 

In keeping with the objective of this book, we choose a microprocessor-based 
implementation. In such cases, the processing tasks must be partitioned between 
hardware and software. As an example, consider an extension to our specifica¬ 
tion, where the time between ECG/EKG peaks is continually measured, and is to 
be displayed on a separate alphanumeric readout. Now we have a choice between 
using an expensive intelligent display, which incorporates an integral ASCII de¬ 
coder E), or a cheaper dumb display, where the segment patterns are picked out 
by software. The former will cost more on a unit basis, but the latter will require 
money before the product is launched, to design the software-driver package. 
This of course is a fairly trivial example, but in general hardware is available off 
the shelf and therefore has a low initial design cost and takes some load off the 
central processor. At this level, software is rarely obtainable off the shelf and re¬ 
quires initial investment in a (fairly highly paid) software engineer, but is usually 
more flexible than a hardware-only solution. In some cases, technical consid¬ 
erations rule out one or other approach. Thus in our example, it is likely that 
the processor will not have time both to display the waveform and to pick out 
the peak (a more difficult task than it seems) in software. Hence, external peak¬ 
picking hardware is indicated, as shown in Fig. 16.11 Of course, this hardware could 
be another MPU running a peak-detection software routine! Thus, when techni¬ 
cally feasible, a software-oriented solution is indicated for large production runs, 
where the initial investment is amortized by a lower unit cost. 

With a provisional task allocation between hardware and software, what choices 
has the designer available in implementing the hardware? There are three main 
approaches to the problem. 

In situations where the ratio of design cost to production numbers is poor, 
a system implementation should be considered. This entails using a commer¬ 
cial microcomputer, such as an IBM PC, as the processor. Such instruments are 
normally sold with keyboard, VDU, magnetic and random access memory, which 
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Software test 
and design 




Figure 11.4 A broad outline of system development. 
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are sufficient for most tasks. Ruggedized rack mounting industrial and portable 
versions are also available. Generally, the hardware engineer will be concerned 
only to customize the system, by designing specialist supporting hardware and 
interface circuitry. The software engineer will create a software package based 
on this microcomputer, which will drive the hardware. The microcomputer will 
support commercially available development packages, such as editors, assem¬ 
blers, compilers and debuggers, which facilitates the software design process at 
low cost. 

Tailoring a general purpose machine to a semi-dedicated role requires a rela¬ 
tively low investment up-front and low production expenses. Furthermore, doc¬ 
umentation and the provision of service facilities are eased, as a pre-existing 
commercial product is used. Technically this type of implementation is bulky, 
but where facilities such as a disk drive and VDU are needed, the size and unit 
cost are not necessarily greater than a custom-designed equivalent. Sometimes 
the customer may already possess the microcomputer; the vendor simply selling 
the hardware plug-in interface and software package. This can be an attractive 
proposition for the end user. 

Thus a system-level implementation is indicated when low-to-medium produc¬ 
tion runs are in prospect and the system complexity is high; for example com¬ 
puterized laboratory equipment. For one-offs this approach is the only economic 
proposition, provided that such a system will satisfy the technical boundary con¬ 
straints. For instance, it would be obviously ridiculous (but technically feasible) 
to employ this technique for a washing-machine controller. 

At the middle range of complexity, a system may be constructed using a 
bought-in single-board computer (SBC). Sometimes several modules are used (eg. 
MPU, memory, interface), and these are plugged into a mother-board carrying 
a bus structure. If necessary these may be augmented with in-house designed 
cards to complete the configuration. 

Although the cost of these bought-in cards is many times that of the ma¬ 
terial cost of the self-produced equivalent, they are likely to be competitive in 
production runs of up to around a thousand. Like system-implemented config¬ 
urations, they considerably reduce the up-front hardware expenses and do not 
require elaborate production and test facilities. By shortening the design time, 
the product can be marketed earlier, and subsequently the econo mi cs improved 
by substitution of in-house boards. W hi lst more expensive than a system based 
on a commercial microcomputer, a board-level implementation gives greater flex¬ 
ibility to configure the hardware to the specific product needs. Furthermore, in a 
multiple-card configuration, at least some of the standard modules can be used 
for more than one product (e.g. a memory card) thus gaining the cost benefits of 
bulk buying. Reference (4j gives an example of this approach. 

Neither system or board-level implementations provide an economical means 
of production for volumes much in excess of a thousand, with the exception of 
high complexity-value products. In many cases, technical demands, such as size 
and speed, preclude these techniques even for small production runs. In such 
situations, a fundamental chip-level design, as outlined in Fig. 111.51 is indicated. 
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Hardware path 
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Software path 
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Figure 11.5 Fundamental chip-level design. 
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Chip-level involves implementation at the integrated circuit (or even silicon) 
level. In this situation, the circuit is fabricated from scratch, giving a config¬ 
uration dedicated to the specific application. Given that the designers require 
an intimate knowledge of, for example, a spectrum of integrated circuits, CAD 
techniques for PCB and ASIC design and software techniques, and must keep an 
eye on the eventual production process, the cost of developing, testing and pro¬ 
duction of such systems is large. Furthermore, the up-front expenses are high 
and may cause cash flow problems. However, the materials cost is low, and this 
approach is often the best technical solution to the problem. 

Software implementation for most dedicated MPU-based systems, irrespec¬ 
tive of the hardware implementation category, is developed in an analogous way 
to chip-level design. This is emphasized in Fig. |11.5| by the two parallel tracks. 
Other than monitors and operating systems, there is little in the way of off-the- 
shelf firmware packages. There is, however, a considerable body of high-level 
routines published, which can be adapted and compiled down to the chosen tar¬ 
get. Nevertheless, in general, software design is an expensive proposition, and is 
difficult to amortize in small production runs. 

Standard microprocessor-based circuitry uses a MPU chip, together with indi¬ 
vidual ICs for memory and I/O interface. Address decoding and other support 
logic (glue logic) is likely to be implemented using programmable logic devices to 
reduce the chip count. Higher production runs may economically utilize single¬ 
chip microcontroller units (MCUs). MCUs are single devices with the MPU, RAM, 
ROM, I/O interface and address decoder all integrated together 0. As an exam¬ 
ple, the 68HC11E9 MCU has an 8-channel A/D converter, two serial and up to four 
parallel ports, timers, 256-byte RAM, 512-byte EEPROM, 12 kbyte of ROM/EPROM 
and a watch-dog timer on the one chip! The processing power of MCUs is gener¬ 
ally less than their MPU equivalents, but newer devices such as the 68HC16 and 
the 68300 series (software compatible with the 68020 MPU) are somewhat more 
powerful (and expensive!). 

Compilers for the more common MCUs (e.g. 6805, 6811, 6816, 8051) are avail¬ 
able, but frequently have restrictions due to the limited architectures of the core 
processor. Table fTTT. 151 a). (f) and (g) are examples of 6801, 6816 and 6811 MCU 
compiled output respectively. Note especially the length of the 680 l's code, it 
being one of the older devices which did not have a progra mmi n g structure de¬ 
signed with a high-level language implementation in mind. 

It is possible to integrate the fabrication pattern of some of the simpler MPUs 
onto your own custom IC, such as the 6805 device. These are held in the library 
of the appropriate CAD package. Provided that memory and other patterns are 
available, in principle a custom MPU-based system can be integrated onto one 
silicon chip. However, this approach is only applicable to very large scale pro¬ 
duction runs and requires considerable skill. Custom hard-disk controllers are 
typical of products which use this technique. 

Figure 111.61 gives an idealized picture of the interaction of the various tech¬ 
nologies with regard to production levels. These relationships make no presump¬ 
tion as regards the technological advantages of the different approaches. 
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Figure 11.6 A cost versus production comparison. 

References 

[1] Riggs, T., et al.; Spectral Analysis of the Normal Electrocardiogram in Children and 
Adults, J. Electrocardiology , 12, no. 4, 1979, pp. 377- 379. 

[2] Wilcox, A.D.; 68000 Microcomputer Systems, Prentice-Hall, 1987, Part 1. 

[3] Cahill, S.J.; The Single-Chip Microcomputer, Prentice-Hall, 1987, Appendix 3. 

[4] Blasewitz, R.M. and Stern, F.; Microcomputer Systems, Hayden, 1987, Section 9.5. 

[5] Cahill, S.J.; The Single-Chip Microcomputer, Prentice-Hall, 1987. 





CHAPTER 12 


The Analog World 


Most real-world parameters are analog in nature. Some examples are tempera¬ 
ture, pressure and light intensity. An analog parameter is a continuum — li mi ted 
in practice between an upper and lower level. Thus a dry-bulb thermometer can 
be read to whatever resolution is necessary, between, say, -10°C and +180°C. Be¬ 
low this, the mercury disappears into its bulb, and above this the top of the tube 
is blown off! Theoretically, the quantum nature of matter sets a lower limit to the 
continuous nature of things, but in practice noise levels and the limited accuracy 
of the device generating the signal sets a realistic upper limit to resolution. 

Digital circuitry deals with patterns of symbols, which represent amplitudes. 
Depending on the number and type of digits making up the pattern, only a finite 
total of values are possible. Most of us tend to use denary (decimal) digits to 
represent our numbers, while computers prefer binary. Patterns made up of 
eight binary digits can represent up to 2 8 (256) discrete values, whilst 16 bits can 
resolve down to ( 65336 ) of full scale (see Table ITZTl . 

Physically, bits are represented in hardware as two values of a signal parame¬ 
ter. Most commonly this is voltage, and typical ranges are 0 - 0.8 V and 2.0 - 5.0 V 
for logic 0/logic 1 respectively. Other values, such as ± 12 V, and parameters such 
as frequency, for instance 1.2/2.4kHz, are in regular use. Digital signals are in 
fact analog signals with analog characteristics, such as finite transition times 
and noise. Although digital systems designers must be cognizant of these sig¬ 
nal properties, they are normally regarded as secondary effects, caused by the 
intrusion of an imperfect world! 

Given that we wish to do our processing using digital techniques, in our case 
a microprocessor, conversion to and from analog signals is necessary. The aim 
of this chapter is to overview this process, with an eye to our specific project. 
As well as the A/D and D/A converters themselves, we will look at some of the 
consequences of the digitization of analog signals. 


12.1 Signals 

Our project specifically targeted the adult ECG/EKG signal as the system input. 
The physical origins of this signal and the use of electrodes as the sensor are 
outside the scope of this text; the interested reader is directed to references (TJ [2] 
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for this information. The most common configuration measures the potential 
difference across the chest, the LA (left arm) and RA (right arm) leads in Fig. lll.3l 
The RL (right leg) is used as the reference point. As the RA-LA potential is rarely 
more than a few millivolts, the following amplifier must provide differential gain 
to the order of 1000 (60 dB). However, there is more to this stage than just gain. 

A good ECG/EKG amplifier must have the following properties, in addition to the 
usual requirements of linearity, slew rate and frequency response: 

1. A common-mode rejection ratio (differential gain/common-mode gain) of at 
least 80 dB. Common-mode signals arise from interference from external sources, 
typically mains hum. Such extraneous signals are important, as they are usu¬ 
ally much greater than the signal of interest. Signals appearing on both leads 
should not affect a differential amplifier's output, but in practice there will be 
some feedthrough. 

2. Suppression of baseline drift. Large, essentially d.c., voltages can appear 
across the electrodes, due to electrolytic action at the skin interface. These 
are not constant, but change slowly with time. Straight amplification by 1000 
would cause the amplifier to saturate. 

3. As a safety requirement, leakage current between electrodes, that is through 
the body, to be less than 10 pA 0. Because of this, an isolating amplifier is 
recommended. This uses a front end with optical or transformer coupling to 
the normally-powered main gain block. This front end can either be battery 
powered or supplied through an isolating power supply (sometimes integral 
with the amplifier). 

4. Protection against pacemaker spikes and, if applicable, defribrillator surges of 
around 2 5 kV! 

Because of safety considerations 0, 1 have resisted the temptation of describ¬ 
ing an ECG/EKG amplifier here. A range of commercially available isolating am¬ 
plifiers is available for biomedical applications, a typical example being the Burr 
Brown ISO 1 OOP. Either disposable or reusable silver/silver chloride electrodes are 
available from any medical supply house If necessary shave and clean the skin 
with surgical spirit before application. However, all the hardware and software 
to be presented can be fully and safely tested in comfort using a sinusoidal or 
function generator. 

Auxiliary circuits, such as filters, level shifters and sample and hold circuits, 
lumped together in Fig. |11.3| as the signal conditioning process, are discussed at 
the appropriate point later. 

Quantizing a signal obviously distorts the original information. In essence, 
quantizing is the comparison of the analog quantity with a fixed number of levels. 
The nearest level is then the value taken in expressing the original in its digital 
equivalent. Thus in Fig. ll2.ll an input voltage of 0.4285 full scale is 0.0536 above 
quantum level 3 and 0.0714 below level 4. Its quantized value will then be taken 
as level 3 and coded as 01 1 b in a 3-bit system. 

The residual error of -0.0536 will remain as quantizing noise, and can never 
be eradicated (see Fig. ll2.2I d)). The distribution of quantization error is given at 
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Figure 12.1 The quantization process. 


the bottom of Fig. 112.11 and is affected only by the number of levels. This can 
simply be calculated by evaluating the average of the error function squared. The 
square root of this is then the root mean square (r.m.s.) of the noise. 


, L L 

fix) = --X + 2 

The mean square is: 
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Giving a r.m.s. noise value of -= = 

A fundamental measure of a system's merit is the signal to noise ratio. Taking 
the signal to be a sinusoidal wave of peak to peak amplitude 2 n L (see Fie. 11 2.2) . 

(—) k 

we have an r.m.s. signal of , that is Thus for a binary system with 

n binary bits, we have a signal to noise ratio of: 



2 n Vl2 

2V2 


1.22 X 2 M 


In decibels we have: 


S/N = 201og 1.22 X 2 n = 6.02 n + 1.77dB 

The dynamic range of a quantized system is given by the ratio of its full scale 
(2 n L) to its resolution, L. This is just 2 n , or in dB, 201og 2 n = 20n log 2 = 6.02 n. 
The percentage resolution given in Table fT/Tl is of course just another way of 
expressing the same thing. 


Table 12.1 Quantization parameters. 


Binary bits 

Quantum levels % resolution 
(2 n ) 

Resolution 
Dynamic range 

S/N ratio (dB) 

4 

16 

16.25 

24.1 dB 

26.9 dB 

8 

256 

0.391 

48.2 dB 

49.9 dB 

10 

1024 

0.097 

60.2 dB 

61.9 dB 

12 

4096 

0.024 

72.2 dB 

74.0 dB 

16 

65,536 

0.0015 

96.3 dB 

98.1 dB 

20 

1,048,576 

0.00009 

120.4 dB 

122.2 dB 


The exponential nature of these quality parameters with respect to the number 
of binary-word bits is clearly seen in Table [701 However, the implementation 
complexity and thus price also follows this relationship. For example, a 20-bit 
conversion of 1V full scale would have to deal with quantum levels less than 1 pV 
apart. Compact disks use 16-bit technology for high quality music. Pulse-code 
modulated telephonic links use eight bits, but the quantum levels are unequally 
spaced, being closer at the lower amplitude levels. This reduces quantization 
hiss where conversations are held in hushed tones! Linear 8-bit conversions are 
suitable for most general purposes, having a resolution of better than ±\%>. Ac¬ 
tually video looks quite acceptable at a 4-bit resolution, and music can just be 
heard using a single bit (i.e. positive or negative)!! 
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The analog world treats time as a continuum, whereas digital systems sample 
signals at discrete intervals. The sampling theorem El states that provided this 
interval does not exceed half that of the highest signal frequency, then no infor¬ 
mation is lost. The reason for this theoretical twice highest frequency sampling 
limit, called the Nyquist rate, can be seen by examining the spectrum of a train 
of amplitude modulated pulses. Ideal impulses (pulses with zero width and unit 
area) are characterized in the frequency domain as a series of equal-amplitude 
harmonics at the repetition rate, extending to infinity |5]- Real pulses have a 
similar spectrum but the harmonic amplitudes fall with increasing frequency. 

If we modulate this pulse train by a baseband signal A sin to/1, then in the 
frequency domain this is equivalent to multiplying the harmonic spectrum (the 
pulse) by A sin uo ft, giving sum and different components thus: 

A R 

Asinco/t x B sin coht = — (sin(tOh + co/)t + sin(tOh - co/)t) 

More complex baseband signals can be considered to be a band-limited (f m ) 
collection of individual sinusoids, and on the basis of this analysis each pulse 
harmonic will sport an upper (sum) and lower (difference) sideband. We can see 
from the geometry of Fig. ll2.2f b) that the harmonics (multiples of the sampling 
rate) must be spaced at least 2 x f m apart, if the sidebands are not to overlap. 

A low-pass filter can be used, as shown in Fig. ll2.21 d). to recover the baseband 
from the pulse train. Realizable filters will pass some of the harmonic bands, 
albeit in an attenuated form. A close examination of the frequency domain of 
Fig. 112. 2I d) shows a vestige of the first lower sideband appearing in the pass 
band. However, most of the distortion in the reconstituted analog signal is due 
to the quantizing error resulting from the crude 3-bit digitization. Such a system 
will have a S/N ratio of around 20 dB. 

In order to reduce the demands of the recovery filter, a sampling frequency 
somewhat above the Nyquist li mi t is normally used. This introduces a guard 
band between sidebands. For example the pulse code telephone network has an 
analog input bandlimited to 3.4 kHz, but is sampled at 8 kHz. Similarly the audio 
compact disk (CD) uses a sampling rate of 44.1 kHz, for an upper music frequency 
of 20 kHz. This means that with a 16-bit sample and 70 minute play period, a CD 
must store around 3000 Mbits! 

A more graphic illustration of the effects of sampling at below the Nyquist 
rate is shown in Fig. 112.31 Here the sampling rate is only 0.75 of the baseband 
frequency. When the samples are reconstituted by filtering the resulting pulse 
train, the outcome, shown in Fig. |12.3[ b), bears no simple relationship to the 
original. This spurious signal is known as an alias. 

Returning now to our project, we have established that the sampling rate will 
be 128 per second. As the baseband of interest is limited to 20 Hz, this seems 
to give us a Nyquist margin of around 300%. However, the ECG/EKG signal does 
have components beyond 1 kHz; and noise, both from external and internal (e.g. 
muscle noise) sources, will have a spectrum extending well above the Nyquist 
li mi t. Thus, an anti-aliasing filter will be required as part of the front end signal 
conditioning process. 
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Time domain Frequency domain 



(a) Analog signal. Baseband 



(b) Sampled signal plus analog to digital conversion. Harmonics and sidebands 


.111 



(c) Reconstituted analog signal (digital to analog conversion). 



Figure 12.2 The analog-digital process. 
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Figure 12.3 Illustrating aliasing. 


Many filter designs exist. That shown in Fig. I12.4l is a 4th-order Butterworth 
low-pass filter using the multiple-feedback configuration. The overall gain in the 
passband is designed to be unity, with the -3dB frequency at 24 Hz. Design 
equations and other relevant data is given in reference Q3). In practical situa¬ 
tions, component tolerances will cause wide deviations from this figure; this is 
especially true of the large capacitors used for low frequencies. Reference (6] 
gives a tuning procedure using R ] R 3 , where more precise results are required. 
The transfer characteristic of Fig. 11 2.41 bl is a real transfer of an untuned cir¬ 
cuit, using 0.1% resistors and ±1% capacitors. Actual preferred values, as shown 
bracketed, were used. From this characteristic, the gain at the Nyquist frequency 
of 64 Hz is -32 dB down from the passband. 


12.2 Digital to Analog Conversion 

Digital to Analog (D/A) conversion can be defined as the production of an analog 
signal whose amplitude is proportional to the quantitative magnitude of the input 
digital word. Natural binary code is weighted in ascending order of powers of two. 
Thus a 4-bit binary number can be written as: 


A = (b 3 x 2 3 ) + (b 2 x 2 2 ) + (b, x 2 1 ) + (b 0 x 2°) 
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Cl 1 

= 10n 

R11 

= 270k 

C21 = 

10n 

R21 = 

91k 

Cl 2 

= 33n 

R12 

= 470k 

C22 = 

220n 

R22 = 

200k 



R13 

= 270k 



R23 = 

91k 



R1 x 

= 330k 



R2x = 

240k 


(a) The circuit diagram. 
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(b) Transfer characteristic. 
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Figure 12.4 A 4th-order anti-aliasing filter. 

where b n is the nth binary digit, either 0 or 1. In general: 

M-l 

A = £ b k X 2 k 
k=0 

for an M-digit word. 

With this definition in mind, the use of a suitably weighted resistor network is 
suggested. Four resistors, each switched to V or earth by one digital bit, feeding 


900 

800 
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the virtual earth of an operational amplifier's summing junction, will give a com¬ 
posite analog output, which is a function of the digital pattern and the resistor 
values. Thus using an 8kQ resistor driven by b 0 , 4kQ driven by b|, 2kQ by b 2 
and 1 kQ by b 3 will feed currents in ascending orders of two into the summing 
junction. 

In a practical situation, the use of weighted resistors, leads to severe accu¬ 
racy problems. These are the result of the wide range of resistance values; for 
example, in a 12-bit system, if b 0 switches in lkQ then bn will have to switch 
in a 2.048 MQ resistor. As well as the problem of matching the ratio of all these 
resistors, the precision analog switches have to carry an equally wide spread of 
currents. 

One way around the matching problem is to use a ladder network, such as 
shown in Fig. ll2.Si a). Looking left at node A we ' see' a resistance 2R. At node B 



(b) Current to voltage converter with optional bipolar output. 


Figure 12.5 The R-2R current D/A converter. 
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we have R + 2R || 2R = 2R. By symmetry, the resistance looking left is always 2R. 
To a precision reference voltage V ref at node D, the ladder appears to be this 
2R resistance in parallel with the 2R resistor to switch b 3 . If the output is short 

circuited to ground, then irrespective of the switch position a current 7 ref = 
will flow into node D from V ref , of which 50% will flow down through the switch. 

The current arriving at node C similarly splits into two parts, with ^ ef being 
switched by b 2 . By continuing this line of argument, it can be seen that each 
switch leftwards controls half of its neighbour's current. The output current is 
then: 


T D 

I Q = ^ b|< x 2 k where b|< = 1 or 0 

6 k=0 

which is the desired transformation ratio. 

The network can be extended ad infinitum by replacing the termination re¬ 
sistor by the requisite ladder sections. Irrespective of the number of bits, the 
absolute value of resistor is irrelevant; only the 2:1 ratio matters. This is impor¬ 
tant where the ladder is implemented on a monolithic silicon integrated circuit, 
where accurate absolute values are impossible to fabricate, but ratios are accu¬ 
rately implemented. 

The switches shown in the diagram are of course electronic and controlled by 
the digital bit pattern. In long words, where the ratio of switchable current is very 
large, alternative ladders are available which have the property of equal currents 
but different voltages 0. Capacitor-based ladders can also be used. 

The output quantity of this network is current. This can conveniently be con¬ 
verted to voltage by feeding this into the inverting input of a feedback operational 
amplifier, as shown to the left of Fig. ll2.51 b). This appears as a virtual earth to 
the network, and gives an output voltage of -J 0 R. A further stage inverts this 
negative-going voltage, giving V Q = I 0 R. 

This final stage may be optionally used to subtract half full-scale voltage, to 
give a bipolar output. Examining Fig. 112.61 b) shows that the most significant 
bit (b 3 ) acts as a sign bit with 1 for positive and 0 for negative. The remaining 
magnitude digits are identical to the equivalent 2's complement code. Thus a 
computer working in 2's complement code need only invert the sign bit before 
outputting to the D/A converter, to give a true bipolar output. This modified 2's 
complement code is known as offset binary code. 

Real D/A converters have characteristics which differ from the ideal transfer 
relationship; as shown in Fig. 112.71 f8l . As an example, consider an 8-bit D/A 
converter going from 0111 1 1 1 1 b to 1 0000000b. In all, eight switches must 
change. Unless these are exactly matched, and their ladder legs too, it is inevitable 
that a blip on the characteristic will occur. If this mismatch is more than one bit 
(less than 0.4% in this example) then it is possible that the trend (larger binary 
codes give larger analog outputs) will be violated. In such cases, the converter 
is non-monotonic. Besides these non-linear errors, the converter may exhibit a 
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Vo 
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Output voltage 
Fraction of V re f 
0 

1/16 

2/16 

3/16 

4/16 

5/16 

6/16 

7/16 

8/16 

9/16 

10/16 

11/16 

12/16 

13/16 

14/16 

15/16 


Input digital code 
b 3 b 2 bi b 0 

0 0 0 0 
0 0 0 1 
0 0 10 
0 0 11 
0 10 0 
0 10 1 
0 110 
0 111 
10 0 0 
10 0 1 
10 10 
10 11 
110 0 
110 1 
1110 
1111 


(a) Natural unipolar code conversion. 
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0 

0 

0 
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0 

0 

1 

0 
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0 

1 

1 
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0 

1 

0 

0 
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0 

1 

0 

1 
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0 

1 

1 

0 
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0 

1 
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1 

0 

0 

0 
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1 

0 

0 

1 
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1 

0 

1 
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0 

1 

1 
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1 

0 

0 
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1 

1 

0 

1 

+ 6/16 

1 

1 

1 

0 

+7/16 

1 

1 

1 
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Figure 12.6 Conversion relationships for the network of Fig. [TZb] 


constant offset and gain error. Unlike the non-linear errors, both these linear 
errors may be trimmed out using the operational amplifier buffers. 

Manufacturers specify their error figures in different ways. For example the 
AD7528's relative accuracy is measured as the maximum deviation of any code 
from the ideal, with offset and gain errors eliminated. Depending on the version, 
this is given as ±1 bit and ±| bit. Differential non-linearity is the maximum 
difference between the ideal 1-bit change expected between any two adjacent 
codes and the actual measured value. This is given as ± 1 bit, and therefore the 
converter is guaranteed monotonic (just!). Gain error is the worst case full-scale 
error due to offset and gain tolerance. It can be as high as ±6 bits, but is easily 
trimmed out if need be. 
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Vo 



O <0 t-H o <0 O* O t-H 0> <0 t-H 

(^) (^) (^) t-H t—H H t—H (^) (^) (^) (^) T—i T-H t-H i-H 

O O CD <0 (O O O <0 -'—I i-H i-H 

Figure 12.7 A real-world transfer characteristic. 

The output port chosen for our project is the Analog Devices AD7528 dual 
8-bit D/A converter. This provides for both X and Y analog channels in the one 
device. The AD7528 is microprocessor-compatible, with two integral 8-bit trans¬ 
parent latches. Besides any necessary operational amplifier network, only an 
external precision reference voltage is required. 

The heart of each converter is a current R-2R ladder network, which is essen¬ 
tially an 8-bit version of Fig. ll2.5f a). From Fig. ll2.8l we see that an additional R 
resistor connected to the output node is also provided (at pins 3 and 19), designed 
to be used as the feedback resistor of the first amplifier stage of Fig. ll2.51 b). Its 
use is illustrated in Fig. 112.91 

The original specification in Section 11.1 called for 0.5% resolution and ±0.5% 
full-scale accuracy. An 8-bit system gives ^ 0.4% resolution, thus a ±1 bit 

non-linear accuracy will fall within our target. 

From Section 11.1, we have estimated a data rate interval to both D/A con¬ 
verters of 56 ps. The settling time for a step between zero and full current 
(00000000b ~ 11111111 b) is 400 ns maximum to within a \ bit of the final 
value (supply voltage V DD of +5 V and V ref of +10 V). This is around 0.7% of the 
step period. Of course the amplifier circuitry converting this current to voltage 
will worsen this figure. 

Both channels are independent, with separate analog sections and 8-bit trans¬ 
parent latches driving the ladder switches. DACA/DACB (pin 6) directs data from 
the MPU bus through to the appropriate 8-bit latch whenever both the Chip_Select 
(CS) and Write (WR) pins are low. In driving DACA/DACB from a 0 , the AD7528 
looks to the MPU as two ordinary 8-bit digital output ports located at adjacent 
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Please insert Fig. 12.8 here 


Figure 12.8 The AD7528 dual D/A converter. Reprinted with the permission of Analog Devices, Inc. 

addresses. An address decoder line enables the CS input, whilst a strobe of some 
kind activates WR when sending data. In the 6809 MPU, the Q inverted clock 
is normally used for this purpose (see Fig. 11.71) . whilst DS fulfils this role for the 
68008 device. The 68000 MPU would use UDS or LDS as appropriate, see Fig. |3.10[ 
With a V DD of +5 V, all digital signals are TTL and therefore MPU compatible. The 
AD7528 response times are (just!) compatible with a 2MHz 6809 and no wait- 
state 8 MHz 68000/68008. 

The reference voltage must be supplied externally. In Fig. 112.91 I have used 
the Plessey ZN040 4.01V ±1% bandgap voltage reference IC for this purpose. 
With the amplifier configuration shown, this gives a full-scale voltage output 
of nominally +4 V. The actual voltage may be trimmed over the range ± 5% by 
connecting pin 2 to the center tap of a 100 kQ potentiometer across pins 1 and 3. 
The choice of reference voltage is fairly arbitrary in our case, as both channels are 
to drive input amplifiers on an oscilloscope. If necessary, a Zener diode will act as 
a reasonable substitute (anode to top) with 5.6 V having the lowest temperature 
coefficient. The series resistance of 1.2 kQ gives a bias current of around 9 mA. 
The minimum current is given as 150 pA, with a ma xim um of 75 mA. This would 
also be suitable for a 5.6 V Zener diode. 

V' re f can vary over the range ±10V. By choosing a negative value, the single 
amplifier/channel configuration used in Fig. ll2.9l gives a positive unipolar range. 
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Analog out (0 to 4V) 



Figure 12.9 Interfacing the AD7528 to a microprocessor. 

The internal feedback resistor has been used with an external 3 3 pF polystyrene 
capacitor in parallel to stabilize the amplifier's high frequency behavior. The 
TL082 operational amplifiers feature a typical slew rate of 1 3 V/ps (minimum 
8 V/ps). Thus a full-scale swing of 4 V will take around 300 ns. Any operational 
amplifier can be used here, but note that the general purpose 741 type has a 
slew rate of only 0.5 V/ps. Analog power supplies of between ±8 V to ±15 V are 
suitable, and can be used for the reference voltage IC bias. 

Ideally the analog ground should be run directly back to the power supply 
common point, rather than make a direct connection to the noisy digital ground. 
Where this is done, it is recommended that two back to back signal diodes be 
connected between them, close to the IC. This reduces the chance of transient 
voltages injecting noise into the system. 

Testing the the D/A converter dynamically is covered in Section 15.2. A simple 
static test is possible before connecting the digital signals to the system. Firstly 
meter the V ref inputs and power supplies. Then keeping pins 6, 15 and 16 to 
digital ground, as well as all eight data lines, check output A is close to analog 
ground. Bring DB7 to V DD . Now output A should be ' V 7 ref . With all DB lines logic 1, 
the output voltage will be « V ref . Repeat with pin 6 at logic 1 , for output B. The 
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deselected cha nn el will retain its last value. Connecting all DB lines together to a 
TTL-compatible square wave generator and monitoring the analog outputs is an 
alternative test, which also checks delay and slew rate parameters. 

Handle carefully to avoid electrostatic discharge damage, and do not insert 
into a powered socket. 


12.3 Analog to Digital Conversion 

The opposite transformation conversion direction of analog to digital is by far 
the more complex of the two. One possible scenario involves an array of analog 
comparators, which simultaneously compare the analog input V m with a series of 
ascending voltage steps of ^ V re f. In Fig. ll2.10l these quantum steps are produced 
by a chain of eight equi-valued resistors, giving l H full-scale resolution. The output 
of any comparator is logic 1 if V in exceeds the quantum, otherwise logic 0. Thus 
eight unique binary patterns are produced, depending on the analog magnitude. 
A relatively simple logic decoder network converts this unary code to natural 
binary. A bipolar conversion is easily accomplished by tying the top and bottom 
of the resistor chain to + *V ref and -^V ref respectively. These so called flash or 
parallel converters are expensive, because of the large number of precision circuit 
components involved, for example 255 high-speed analog comparators for an 8- 
bit converter. However, this technique is fast, with conversion times of better 
than 30 ns achievable in commercial products (for example the TRW TDC1007J 
8-bit converter 0). 

For most microprocessor applications, the short conversion time is an em¬ 
barrassment, demanding direct memory access procedures to make full use of 
this speed. Instead, most MPU-compatible A/D converters use the more prosaic 
technique of successive approximation. This is the electronic equivalent of the 
beam balance. Consider an unknown weight placed in one pan of such a balance, 
and a range of known weights in powers of two available to the operator. A sys¬ 
tematic approach to the deter min ation of the unknown quantity is to start with 
the largest known weight. If this is too light, then it is left in the pan, otherwise 
it is removed. The same procedure is carried out in descending order until the 
smallest standard weight has been used. The solution is then the assemblage of 
weights in the pan, to the resolution of the smallest weight. 

One possible 8-bit electronic utilization of this strategy is shown in Fig. ll2.lf1 
Here a MPU addresses a digital to analog converter, the equivalent of the beam bal¬ 
ance. In software the various bit patterns from 1 0000000b down to 00000001 b 
can be added to the subtotal and sent out to the D/A converter. The resulting 
analog equivalent is compared to the incoming voltage, and the magnitude rela¬ 
tionship between them read through the 3-state buffer at bit 7 1101. 

A possible coding in C is given in Table IT272l Here constant pointers to the two 
peripherals are assigned addresses (assumed to be 6000 h and 6002 h), and two 
byte-sized variables defined to hold the assemblage of bits and the test pattern. 
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V in 

d 

0 -» V ref /8 0 

Vref/8 -> 2Vref/8 0 

2Vref /8 3V ref /8 0 

3Vref /8 -» 4Vref/8 0 

4V ref /8 -> 5V ref /8 0 

5V ref / 8 -» 6V ref /8 0 

6V re f/8 -» 7V ref /8 0 

>V ref /7 1 


Level 1 Level 2 

^6 ^5 d 4 d 3 d 2 d-| C B A 

000000 000 

0 0 0 0 0 1 0 0 1 

0 0 0 0 10 0 10 

0 0 0 10 0 011 

0 0 10 0 0 10 0 

0 10 0 0 0 10 1 

10 0 0 0 0 110 

0 0 0 0 0 0 111 


Figure 12.10 A 3-bit flash A/D converter. 
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Figure 12.11 A software controlled successive approximation D/A converter. 


The former is initialized to zero (nothing on the pan) and the latter to 80 h (largest 
known weight, 1 0000000b). 

The while loop adds the test weight to the trial pattern, sends it out to the 
D/A converter and checks the comparator output. If this is logic 1 (i.e. in line 13, 
the contents of the address comparator ANDed with 1 0000000b is non-zero; 
* comparator & 0x80) then the test weight is removed (subtracted); otherwise 
it is left as part of the aggregrate. Each new test weight is generated by shifting 
weight right once. As weight has been declared unsigned, this should be a 
Logic Shift Left operation (but strictly is compiler dependent). After eight passes 
through the loop, the state of the aggregate is the digital equivalent to V m . 

Although the circuit of Lig. 112.111 works, it is fairly slow. Typically an 8-bit 
conversion will take around 100 ps at best. A wide range of stand-alone succes¬ 
sive approximation converters are available to interface directly to MPU buses, 
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Table 12.2 C driver for Fig. 1 1 2.171 


#define d_a_address (unsigned char 
#define comp_address (unsigned char 

anaiog_i n() 

{ 

unsigned char * const d_a 
unsigned char * const comparator = 
register unsigned char digital 
register unsigned char weight 

while (weight != 0) 

{ 

digital += weight; 

*d_a = digital ; 
if (’-comparator & 0x80) 

{digital -= weight;} 
weight »= 1; 

} 

return (digital); 

} 


*)0x6000 
*)0x6002 


d_a_address; 
comp_address; 

0; /* The digital trial */ 

0x80; /* The walking weight */ 

/* Add weight to trial */ 

/* Send out to d_a converter */ 

/* IF too big */ 

/* THEN remove weight from trial */ 

/* weight divided by 2 */ 


/* Nearest approximation returned*/ 


taking 10 ps or less to complete an 8-bit conversion. Word sizes up to 16 bits are 
cataloged. 


Please insert Fig. 12.12 here. 


(a) Functional diagram. (b) Simplified functional circuit for DACA. 

Figure 12.12 Functional diagram of the AD7576 A/D converter. Reprinted with the permission of 
Analog Devices Inc. 
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For our project we have chosen to use the Analog Devices AD7576 8-bit A/D 
converter, as outlined in Fig. 112.121 The AD7576 is a monolithic device contain¬ 
ing all the digital and analog circuitry necessary to implement the successive 
approximation strategy. In Fig. 112.121 the block labelled SAR is the Successive 
Approximation Register, holding the bit pattern trial as it is built up (digital 
in Table ITZTI . The Control Logic box sequentially sets each flip flop in the SAR, 
clearing it shortly after, if the comparator (COMP) indicates that the analog output 
of the D/A converter (DAC) is above the input (A in ). The timing of this sequence 
is a function of the internal Clock Oscillator box, whose frequency is controlled 
by CR components at pin 5. The minimum conversion time is given as 10 p s. An 
external oscillator may alternatively be used to drive pin 5, and in this situation 
2 MHz gives the 10 ps minimum conversion time. 

The AD7576 operates in two modes, depending on the state of the MODE 
input. If pin 3 is high, then a low-going signal at the RD pin begins the conversion 
process, provided that the device is enabled (CS = 0). BUSY goes low during 
this process, and returns high when it has been completed. The new data is 
transferred to the internal latch register on the rising edge of BUSY. These latches 
are interfaced to the data bus via integral 3-state buffers, which are enabled when 
RD (i.e. Read) is low and the device is enabled. Thus the RD control is a dual 
purpose Start Convert and Read function, that is in reading data a new conversion 
is automatically initiated. 

The interface diagram of Fig. I12.13l a) uses the AD7576 in its asynchronous 
mode (pin 3 low). Here the A/D converter performs continuous conversions. Data 
in the output latches is always valid, and can be used (RD and CS low) at any time. 
With the clock CR components shown, the data is never more than 10 ps out of 
date. 

The AD7576 is powered by a single +5 V supply. As this will probably be com¬ 
mon with the logic supply, it should be decoupled to analog ground as close to 
the device as possible, with a recommended 47 pF tantalum capacitor in parallel 
with a 0.1 pF ceramic capacitor. This supply, and analog ground, should be run 
directly back to the power supply. 

The internal D/A converter requires an external V ref of 1.23 V ± 5%. This is pro¬ 
vided from an AD589 bandgap reference, and should be decoupled in the same 
way. With this value of V ref , full scale at 2V ref is 2.46 V, giving a nominal 10 mV 
resolution. The internal D/A converter suffers from the same errors discussed 
in Section 12.2 The resulting non-linear error (the relative accuracy) is either ±1 
or ± \ bit maximum, depending on the device selection type. This is within our 
specification. 

The input analog range is unipolar 0 - 2 V ref . The simple operational amplifier 
network shown in Fig. l 12.13T b) will convert a bipolar input to the necessary range, 
by adding a constant bias. This offset may be alternatively incorporated into the 
anti-aliasing filter. The resulting code is in offset binary form and can, if neces¬ 
sary, be converted to 2's complement form by inverting the MSB (see page 13321 . 


One consideration remains. The analog input is changing during the time the 
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Microprocessor 


(a) Interface connections. 

R 

----- V ref 


R 



(b) Level shifting a bipolar analog signal to the range 0 to 2 V re f . 


Figure 12.13 Interfacing the AD7576 to a microprocessor. 

conversion takes place. Accuracy considerations dictate that any change should 
not exceed one bit during this aperture time. Taking, as a worst-case situation, 
a sinusoid swinging through the full scale, as shown in Fig. 112.141 then we can 
determine the rate of change by differentiation: 

Rate of change (^) is V re fto coscot 
Maximum when cos cot = 1 is V ref to volts s _1 
Aperture time is 10 ps, therefore: 
change in 10 ps ( 5 ) is 10 _5 V ref to 
and thus: 

5 < 1 bit 
10 _5 V re fCO < 


256 
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co < 781 radians s 1 
f < 124 Hz 

This is well within our specification, but if, say, a 12-bit conversion was needed 
within 16 ps, then the upper frequency falls to less than 10 Hz! In such cases a 
sample and hold (S/H) circuit preceding the A/D converter must be used. This 
captures the signal, with typically a 40ns aperture time till . The principle of 
most S/Hs involves a capacitor being charged up during the sample period, and 
held whilst conversion occurs. As with A/D and D/A converters, S/H circuits are 
normally obtainable as monolithic integrated circuits. 

Although S/H aperture times are low, they may take several ps to stabilize, 
after which conversion can commence. They tend to droop during hold, as the 
capacitor looses its charge (typically 20 pV/ms), and suffer from all analog ill¬ 
nesses of the flesh, such as drift, offset and non-linearity. Thus the S/H must be 
matched to the A/D converter's performance. 
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CHAPTER 13 


The Target Microcomputer 


From our discussion, the target microcomputer will have the following facilities: 

1. A single-channel 8-bit analog input port. 

2. A dual-channel analog output port. 

3. A single-bit digital output port for the flyback blank. 

4. A single-bit digital input port to read the freeze switch. 

All this is in addition to the memory, address decoder and other necessary sup¬ 
port circuitry. 

In consideration of the requested sampling rate variation of -50% to + 100% 
around the no mi nal 128 per second value, both of the following circuits use an 
oscillator connected to the MPU's interrupt line(s). The interrupt frequency can 
easily be varied using a potentiometer. Furthermore, a switch connected to this 
sampling oscillator's Reset acts as a convenient freeze input. No sample rate - 
no new samples. 

The alternative scheme requires a switch port, not only to read the freeze- 
request switch (see Fig. |11.3> , but to read several switches requesting the sam¬ 
pling rate. Although I have not used this technique, the two microcomputers 
developed in this chapter have 4-bit switch ports provided. This gives a Read¬ 
option expansion capability, and is exploited in Chapter 14, where diagnostic 
software tests are discussed. 

The provision of an 8-bit digital port is a little more expensive than the neces¬ 
sary 1-bit output. This is also useful for diagnostic purposes and gives additional 
scope for expansion. 

Microcomputers based on both the 6809 and 68008 MPUs are developed in 
the next two sections. By using C to target two different MPUs, we will be able to 
investigate one of the major advantages of a high-level language. 


13.1 6809 - Target Hardware 

The implementation shown in Fig. ll3.ll is based on a 6809 MPU running at 1 MHz. 
This is set by the 4MHz crystal/capacitor network Yl, C9, CIO. A power-on 
manual Reset signal of a no mi nal 100ms duration is provided by S3, Cl 1, R1 5. 
This relies on the Schmitt trigger action of RST, which is described in Section 1.1. 
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Fiqure |T3TTt The 6809-based embedded microprocessor implementation {continued next 
page). 
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Figure 13.1 (continued! The 6809-based embedded microprocessor implementation. 
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Samples are acquired at a rate dictated by the astable network U7, C6, R3, R7. 
Based on a 555 timer m, the total period is given by the relationship: 


t p = 0.693(R7 + 2R3)C6 

and can be varied with R3 from nominally 60 - 250 Hz. The 5 5 5 is a noisy device, 
and thus the +5 V supply should be locally well decoupled. By connecting SI to 
the 555's Reset, the astable can be halted. Thus no further updates will occur, 
giving a frozen display. NMI is used as the interrupt input, as its edge-triggered 
nature obviates the need for an external interrupt flag, such as used in Fig. |6.6| 
All unused interrupt lines, as well as BREQ and MRDY, are tied high through R5. 
HLT has its own pull-up resistor R4, as this line is frequently used by in-circuit 
emulators to control the progression of the MPU. 

The address map for the system is: 


0000 - 07FF/t 
2000 h 
2001 h 
4000 h 
8000 h 
AOOOh 
E000-E7FFh 


6116 RAM 

Analog output channel X 
Analog output channel Y 
Analog input channel 
Digital input port 
Digital output port 
2716 EPROM 


The address decoder comprises U3 and U4. The 74HCT138 splits the mem¬ 
ory map into eight 8 kbyte pages, six of which enable the devices above. All 
Write-to devices include Q as part of their enabling logic. The digital output 
port U2 is clocked by the rising edge of this strobe, whilst the dual D/A converter 
A/Dl is enabled by it. The 6116 RAM uses Q together with R/W as a modified 
Read/Write control. This is shortened during a Write cycle, as described in Fig. [L8l 
Output_Enable is driven by R/W to ensure that no data is output during the pre¬ 
mature ending of a Write cycle. OE of the 2716 EPROM (usually labelled V pqm ) 
is similarly enabled, to prevent accidental writing to a read-only memory. NAND 
gates U4 provide these auxiliary functions. 

Both RAM and EPROM have a 2 kbyte capacity, which is more than adequate 
for our application. With a 1 MHz clock frequency, any speed selection will be 
suitable. With a 2 MHz clock, a 300 ns EPROM is required. Although it is possible 
to purchase such a 2716 (or Texas 2516) it is easier and cheaper to use a 2764 
8 kbyte device at this speed (see Fig. 113.31 . If deshed, an integral battery backup 
48Z02 RAM may be directly substituted for the 6116. RAMs with an access time 
of 150 ns (min) should be used for a 2 MHz processor. 

The two analog ports are as described in Figs ll2.9l and ll2.131 Figure [lTT]does 
not show any necessary filtering and buffering. 

Quad 3-state buffer U9/1 0 provides input port facilities for four switches. This 
74HCT125 is directly enabled from the address decoder. A 74HCT377 connected 
as described in Fig. 11.71 gives a byte-sized digital output port. One of the lines 
can be used to blank out the CRO during flyback, and the others are free. Some 
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CROs require large negative voltages, typically -40 V, to perform this function. 
In such cases a suitable transistor buffer and power supply will be required. 

A free-run facility, HDR1, R1 6, D1, D2 and SW2 is shown in Fig. 113.11 This al¬ 
lows the user to exercise the processor before software is available for the EPROM 
and without using an in-circuit emulator. Its action is described on page [399] 

The complete circuit requires +5V at typically 250 mA and ±12 to ±15 V at 
25 mA. The analog ± 15 V is conveniently supplied from a dual d.c./d.c. converter, 
such as the Citec BC5151S ±5V to ±15 V device. Care should be taken, as most 
converters are not short-circuit proof. Any analog grounds should be returned to 
this power supply 0 V together with the ±5 V's ground return. The supplies should 
be decoupled using a mix ture of 1 pF tantalum and 0.1 pF ceramic capacitors at 
around one capacitor each two devices. 

Any suitable wiring technique may be used for the prototype. We use wire- 
wrap with considerable success. This avoids close parallel paths for the clock and 
bus signals and reduces crosstalk. It is especially important to keep the analog 
signals as far away from such digital lines as possible. Whatever technique is 
used, it is important to color-code any wiring to aid in the debug phase. Several 


Title 6809 TCM PAL Address decoder 
Chip PAL16L8 
;pin list 

A15 A14 A13 R_W Q NC NC NC NC VCC 

NC /OE RAM_RW /ROM_EN /DIG_OUT /DIG_IN /AN_IN /AN_OUT /RAM_EN VCC 


Equations 


/RAM„EN = 
/AN_OUT = 
/AN_IN = 
/DIG_IN = 
/DIG_OUT= 
/ROM_EN = 
RAM_EN = 
/OE 


/A15*/A14*/A13 
/A15*/A14* A13*/R_W*Q 
/A15* A14*/A13*R_W 
A15*/A14*/A13*R_W 
A15*/A14* A13*/R_W*Q 
A15* A14* A13*R_W 
R_W+/R_W*Q 
/R_W 


i RAM enabled at 0000 

s Analog output on 2000h and Write and Q 
; Analog input on 4000h and Read 
; Digital in on 8000h and Read 
; Digital out on AOOOh and Write and Q 
; EPROM at EOOOh and Read 
; Modified R/W 
; For memory OE 


/RAM_EN o TRST = VCC 
/AN_OUT.TRST = VCC 
/AN_IN.TRST = VCC 
/DIG_IN o TRST = VCC 
/DIG_OUT oTRST= VCC 
/ROM_EN.TRST = VCC 
RAM_RW.TRST = VCC 
/OE.TRST = VCC 

S Tie 74377's /CE to /DIG_OUT 
; Tie AD7528,s /CS and /WR to AN_OUT 



/RAM_EN 

/AN_0UT 

/AN_IN 

/DIG_IN 

/ROM_EN 

/RAM_EN 

RAM_RW 

/OE 


Figure 13.2 A PAL-based 6809 address decoder implementation. 
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strands should be used for the +5 V and its return paths. 

One final point refers to the address decoder. Most current circuitry uses a 
PAL (Programmable Array Logic) implementation to reduce the chip count. The 
circuit of Fig. ll3.ll is so simple that it is unlikely to be commercially viable to do 
so. However, the design is straightforward and if you have access to a PAL pro¬ 
grammer and CAD software, then a PAL16L8 provides for ten inputs and eight 
active-low outputs in a single 20-pin package. Connection details and the requi¬ 
site equations in the PALASM2 language m are given in Fig. 113.21 Other languages 
use a similar, but not identical, notation. Note that the analog and digital output 
ports are qualified internally by Q. The absence of an explicit Q (indicated as /Q 
in PALASM language), means slightly rewiring those devices using this qualifier, 
as indicated on the equation comments. A complete design of a PAL-based 6809 
system is given in Example 6.2 of reference (3]. 


13.2 68008 - Target Hardware 

The implementation of Fig. 113.31 is based on a 68008 MPU, running at 8 MHz. 
Recall from Fig. 13.31 that the 68008 device is a full 68000 processor, but with 
an 8-bit data bus, single Data Strobe (DS) and a commoned IPL0/IPL2 interrupt 
request line. 

The 68008 is externally reset when both Halt and Reset lines are asserted. 
Although an active period of only ten clock cycles (1.25 ps for an 8 MHz clock) is 
required for a successful initialization, at least 100 ms is required on power up. 
This provides for stabilization of the system clock and on-chip circuitry. The 
situation is further complicated by the fact that both Halt and Reset can be used 
by the MPU as an output. Halt is asserted if the processor detects a double-bus 
fault, for example where the the initial PC or SSP addresses in the vector table are 
odd. Remember that odd addresses are illegal. The privileged instruction RESET 
forces the Reset pin low, which is used to initialize external peripheral devices. 

These requirements are met in Fig. 113.31 using a 555 timer connected as a 
monostable, through 3-state buffers U4A and B to Reset and Halt. The active 
period of the monostable is set by R3/C1 , according to the relationship ||4j: 

t p = 1.1 (R3 Cl) ~ 500ms 

The monostable is triggered when TR (pin 2) rises above |v cc , and is delayed on 
power-up by R4/C2. 

Sampling is regulated by using an astable to drive interrupt lines IPLO/2 IPL1 
in parallel. This gives a level 7 non-maskable interrupt, at a rate varying between 
nominally 60 and 250 Hz. Design details are given on page !348l No external inter¬ 
rupt flag is required due to the edge-triggered nature of the interrupt. Gate U2A 
decodes Function Code 111, the interrupt acknowledge condition, and by driving 
VPA ensures that autovector 31 is used as the pointer to the interrupt service 
routine (see Fig. l6.8L 

The memory map for the system is: 
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Figure [T~3.3t The 68008-based embedded microprocessor implementation (continued 
next page). 
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Figure 13.3 (continued) The 68008-based embedded microprocessor implementation. 
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00000-01 FFFh 
02000 h 
02001 h 
04000 h 
08000 h 
0A000 h 
OEOOO-OFFFF h 


27C64 EPROM (250 ns or better) 
Analog output Channel X 
Analog output Channel Y 
Analog input channel 
Digital input port 
Digital output port 
6264 RAM (or 6116) 


The address decoder comprises U9, Ul 0 and U4C. The 74HCT138 splits mem¬ 
ory up into eight 8 kbyte pages. Address lines a] 9 - a] 6 are ignored by this scheme, 
and this of course gives 15 images of each page. Gates Ul 0/U4C detect wherever 
a memory access is made, and activate DTACK. All peripheral devices are fast 


Title 68008 TCM Address decoder 
Chip PAL20L10 
;pin list 

A13 A14 A15 /AS /DS R_W FC0 FC1 FC2 NC NC GND 

NC /DTACK /IACK DS /R_W /RAM_EN /DIG_OUT /DIG_IN /AN_IN /AN_OUT /ROM_EN VCC 


Equations 


/DTACK 

/IACK 

DS 

/R_W 

/ROM_EN = 
/AN„OUT = 
/AN_IN 
/DIG_IN = 
/DIG_OUT = 
/RAM_EN = 


/RAM_EN+/DIG_OUT+/DIG_IN+/AN_IN+/AN_OUT+/ROM_EN 

FC0*FC1*FC2 

DS 

/R_W 

/A15*/A14*/A13*/AS*R_W 
/A15*/A14* A13*/R_W 
/A15* Al4*/Al3*/AS * R_W 
A15*/A14*/A13*/AS*R_W 
A15*/A14* A13*/AS*/R_W 
A15* A14* A13*/AS 


; Any legitimate device selected 
; FC = 111 for an interrupt 
I Complement of /DS 
; Complement of R/W 
; EPROM from 0000-lFFFFh and Read 
j Analog output on 2000h and Write 
; Analog input on 4000h and Read 
; Digital in on 8000h and Read 
I Digital out on AOOOh and Write 
; RAM enabled at EOOO-FFFFh 


/DTACK»TRST 

= 

VCC 

DS.TRST 

= 

VCC 

/R_W = TRST 

= 

VCC 

/ROM_EN.TRST 

= 

VCC 

/AN_OUT.TRST 

= 

VCC 

/AN_IN o TRST 

= 

VCC 

/DIG_IN„TRST 

= 

VCC 

/DIG_OUT.TRST 

= 

VCC 

/RAM_EN.TRST 

= 

VCC 


A 1 3 
A 1 4 

Al 5 

/AS 

R_W 

FC0 

FC1 

FC2 

/DS 


/DTACK- 

/IACK 


20L1 0/ 
22V10 


23 

22 

21 

20 

19 

18 

17 

16 

15 

4 


/R0M_EN 

/AN_0UT 

/AN_IN 

/DIG_IN 

/DIG_0UT 

/RAM_EN 

/R_W 

DS 


Figure 13.4 A PAL-based 68008 address decoder implementation. 
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enough to support direct feedback in this manner, without the necessity of in¬ 
troducing a delay as shown in Fig. 13.91 Care should be taken that the EPROM has 
an access time of 250ns or better, and the RAM has a 120ns maximum access 
time (see Section 3.3). Alternatively, a lower-frequency clock oscillator (minimum 
2 MHz) can be used with slower devices. The digital output port is clocked by the 
falling edge of DS, at which time data on the bus has stabilized (point 5 in Fig. 13.71 . 
Both RAM and analog output ports are enabled when DS is active. The EPROM is 
only enabled when R/W is high, to prevent an accidental Write-to operation. The 
RAM's output buffers are similarly disabled when R/W is high. Interface details 
for the AD7528 and AD7576 are given in Figs 1 12.91 and 112.0 respectively. 

A free-run facility, HDR 1 and HDR2 is shown between the data bus/DTACK 
and the MPU. By substituting the two headers, the user can check out the system 
before software is available for the EPROM and without using an in-circuit emu¬ 
lator. It is also a useful diagnostic aid when the system is in service. Its action is 
described on page |400l 

The complete circuit requires +5V at typically 300mA and ±12/ ± 15 V for 
the analog circuitry, at 25 mA. Normal power supply and decoupling practice, 
as described in the last section, should be followed. However, the data sheet 
indicates that the 68008 MPU can take current peaks of 1.5 A El- Thus a direct 
connection using heavier or multiple wiring between the 68008's power pins and 
the power supply is recommended, as is local decoupling. 

If you have access to a PAL programmer, a PAL20L10 or 22V10 can be used to 
implement the address decoder and other glue logic. Chips U9, U1 0 and U4C/R2 
are replaced by the one 24-pin device. Connection details and the requisite equa¬ 
tions are given in Fig. 113.41 
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CHAPTER 14 


Software in C 


From Section 11.1, two main tasks can be identified. 

Task 1: 

BEGIN: 

Forever do: 

Scan and send out to the Y-plates the 256 stored array values from oldest to 
newest, while incrementing and sending out the X count to the X-plates (left to 
right). 

Flyback: 

End each scan with a flyback procedure. 

END: 

Task 2: 

BEGIN: 

Forever do: 

At regular intervals interrogate input and place sampled value into array, over¬ 
writing oldest value. 

END: 

In the remainder of this chapter we will develop the necessary data structures 
and interaction between these tasks. From this, a general program in C is devel¬ 
oped; followed by topics specific to the two chosen targets. 


14.1 Data Structure and Program 

Central to the software implementation of our time-compressed memory is the 
data organization. This conprises an array of 256 bytes, each holding a sample 
of the analog input. This array is to be scanned at high speed from the oldest to 
newest sample. At the same time, at around 128 times per second, a new value 
is to overwrite the oldest element, and the pointer-to oldest moved on one place. 

Treating the array as a circular structure, as shown in Fig. 114.11 emphasizes the 
repetitive nature of the scan and update. Of course this closed data organization 
is conceptual only; the array is stored in RAM in the normal linear manner. 

Two tasks have been identified. The scanning task is sequenced by the vari¬ 
able i, which counts from 0 to 255. By adding this index to the pointer 01 dest, 
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Figure 14.1 Data stored as a circular array. 

elements are accessed from the oldest member (i = 0) to the newest member 
(i = 255). At the same time, i is converted to its analog equivalent and hence 
drives the X spot from left (oldest) to right (newest). 

The job of the updating task is to fetch a sample into the array, to where 
Oldest points, and move that index on one. Thus the element just before Oldest 
is the most juvenile sample. When a whole scan of 256 samples has been com¬ 
pleted, flyback occurs and the process begins again, but this time beginning from 
the current most ancient element. The circular manner of this scan is simulated 
by wrapping around the sum of 01 dest plus i modulo-256, that is from 255 back 
to 0 (11111 111 b + 1 = 00000000b). 

The software implementation of our time-compressed memory is given in Ta¬ 
ble [14J] The two tasks are assigned to different functions, mai n() implements 
the initialization, repetitive scan and flyback. New samples are acquired and 
the array and Oldest pointer updated by the function update(). This is de¬ 
signed to be entered via an interrupt, and so no data is sent or returned from it. 
Communication between tasks is via the global data array Array [] and global 
index Oldest. Both are defined before mai n(), and therefore are known to both 
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Table 14.1 The fundamental C coding. 

/* Version 16/11/89 */ 

#include <hard.h> 

unsigned char Array [256]; /* Global array holding display data */ 

unsigned char Oldest; /* Index to the Oldest inserted data byte (left point on screen)*/ 

main() 

{ 

register short int i; /* Scan counter */ 

register unsigned char leftmost; /* The initial array index when x is 0 */ 

unsigned char * const x = ANALOG_X; /* x points to a byte @ (address) ANALOG_X */ 

unsigned char * const y = ANALOG_Y; /* y points to a byte @ (address) ANALOG_Y */ 

unsigned char * const z = Z_BLANK; /* The z-mod port (digital port) */ 

Oldest =0; /* Start New index at beginning of the array*/ 

for(i=0; i<256; i++) /* Clear array */ 

[Array[i] =0;} 

while(l) /* Do forever display contents of array */ 

{ 

leftmost = Oldest; /* Make leftmost point on the screen the oldest sample */ 

for (i=0; i<256; i++) 

{ 

*x = (unsigned char)i; /* Send x co-ordinate to X plates */ 

*y = Array[(leftmost+i)&0x0ff]; /* and the display byte to the Y D/A */ 

} 

*z = BLANK_0N; /* Blank out for flyback */ 

*x = 0; /* Move to right of screen */ 

*y = Array[01dest]; /* Y value at left of screen */ 

for(i=0; i<5; i++) {;} /* Delay */ 

*z = BLANK_0FF; /* Blank off */ 

} /* Do another scan */ 


} 

^*********** * * * * * * * * ******************** * * * * * *************** * * * * * * * * * * * * * *************** 

* This is the NMI interrupt service routine which puts the analog sample in the array * 

* and updates the New index * 

* ENTRY : Via NMI and startup * 

* ENTRY : Array[] and Oldest are global * 

* EXIT : Value held at a_d in Array[01dest], Oldest incremented with wraparound * 

* at 256 (modulo-256) * 

******** * * * * * * * * * * * * * * * * * ********* * * * * * * * ****************** * * * * * * * * * * * * * * * * ************ 


void update(void) 

{ 

volatile unsigned char * const a_d = ANINPUT; /* This is the Analog input port */ 

Array[01dest++] = *a_d; /* Overwrite oldest sample in Array[] & inc Oldest index */ 

} 


functions. The former is defined as having 256 unsi gned char (byte) elements, 
whilst the latter is a single unsigned char. Each element therefore can vary from 
0 to 255. Details of the entry to updateO and the header file hard. h, included at 
the beginning of the file, are discussed in Sections 14.2 and 14.3. The header file 
contains hardware-related detail, such as the locations of the various peripheral 
devices. 

mai n () begins by defining five local variables. Both i , the integer scan counter, 
and leftmost, the char element indicating the most ancient array entry, are 
defined as being of type regi ster. Both are used inside the scan loop, and will 
benefit from being stored internally. Processors with insufficient registers will 
ignore this request. The variables x, y and z are defined as being fixed pointers to 
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unsi gned chars (bytes), and are assigned as ANAL0G_X, ANAL0G_Y and Z_BLANK, 
which are given values (addresses) in the header file. As they are qualified as 
const, any subsequent attempt to change them will be reported by the compiler 
as an error. 

The program proper commences by zeroing the global variables Oldest and 
Ar ray []. Strictly this run-time initialization is not necessary, as ANSIIC specifies 
that global variables are to be considered zero if not explicitly initialized in their 
definition. To simulate this situation, the relevant RAM locations could be zeroed 
in the startup routine. In this case we have chosen to do this in the C coding. 
Actually the system will operate perfectly satisfactorily if not cleared, but there 
would be a 2-second transient display while the array was being filled with the 
first 256 samples. 

After initialization, an endless loop is entered inside the body of whi 1 e (1). At 
the commencement of this loop the local variable 1 ef tmost is equated to 01 dest. 
This prevents changes in Oldest during the scan (i.e. via update()), altering the 
display. 

The scan itself uses a for loop construction, with i acting as the loop counter, 
i has been defined as an i nt, so that the condition i < 256 False can be used as 
a loop terminator. If i was a char, it would wrap around at 255. In this situation 
a break on i = = 2 5 5 at the closing brace should be used as the out condition (see 
Table H4~8k 

The for body simply assigns the contents of x (the ANAL0G_X output) to i (0 
to 255), and the contents of y (the ANAL0G_Y output) to the array element. The 
index of the array is the sum of the X co-ordinate (i.e. i) plus the 1 ef tmost value, 
truncated to 8-bits (modulo-256) by ANDing with 00001 1111111b (OxFF). This 
stratagem achieves a wrap around at 255. For example if 1 ef tmost were 180 and 
i were 159, then Array [85] is the value sent to the Y-plates (180 + 159 is 83 when 
added modulo-256). A similar result could be obtained if the sum was given an 
independent i nt-sized existence and then cast to char. I have used such a cast in 
equating the char-sized contents of x to the integer i, x = (unsigned char)i;. 
In practice the compiler will truncate the rjvalue in assigning to a small Lvalue 
(see page !223t . 

Flyback is generated by sending the correct patterns to the Z port (BLANK_0N 
is defined in the header), zero to the X-plates and the initial array value to the 
Y-plates. A short null for-loop gives a delay, before the BLANK_0FF pattern is 
sent out to Z. After this, the scan begins again. 

Function update() is very short. The local pointer variable a_d is defined as 
being the const address ANINPUT, whose absolute value is given in the header. 
This pointer is to an unsigned char (byte) which is volatile (changes sponta¬ 
neously) and is const (read-only). The value read from this port is then put into 
the array at the oldest index, and the global variable Oldest automatically in¬ 
cremented. We are relying here on the char nature of Oldest wrapping around 
at 255. An explicit wraparound would be necessary for other array lengths. 

Function update () assumes that the analog to digital converter can be treated 
as a simple read-only input port. In that respect the program is not portable. 
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Normally, a separate function is used for more complex parts, frequently called 
getchar (). Such a function would be part of an input/output library, which was 
hardware specific, or would appear in the header file. Similar assumptions have 
also been made for output in mai n(). 

Portability has been further compromised by the assumption that char objects 
are 8-bit wide. In practice this is true for the vast majority of microprocessor 
targeted compilers. However, 9-bit character systems do exist, and the use of 
complex character sets, such as Japanese, requires 16-bit characters. ANSII C 
makes no guarantees regarding the 8-bit nature of char objects. 

In the next two sections we look at machine-specific details regarding our two 
target circuits of Figs 113. l| and |13.3l 


14.2 6809 - Target Code 

The first of our targets is the 6809 hardware implementation of Fig. 113.11 The 
header file hard_09. h of Table ll4.~2l gives a memory map of the various peripher¬ 
als and defines the two digital patterns BLANK_0N and BLANK_0FF for the Z out¬ 
put. ROM and RAM details are given for use by the diagnostic software of Sec¬ 
tion 15.2. Including this header customizes the C program of Table [1411 to the 
6809 target board, and no further modification is required. 

A complete listing of 6809 assembly-level code intermingled with the original 
C source-code is shown in Table 114.31 This was produced using the Intermet- 
rics/COSMIC 6809-C cross-compiler V3.3. I have tidied up the original compiler 
output, for example removing remarks inserted by the optimizer, and added com¬ 
ments, indicated by the prefix ; ##. 

The added comments are self-explanatory and will not be discussed in the 
text. There are, however, several points to note. Firstly the register qualifier 
for the variables i and leftmost have been ignored by the compiler. This is 
common for 8-bit MPU targets, as most of these devices are characterized by 
a paucity of registers. This is rather a pity, as the Y register is not used and 
would conveniently hold the loop variable i. This would remove the necessity for 
the double-precision incrementation for i ++ (e.g. lines 76 - 78 could be replaced 
by LEAY 1, Y) and the addition of lines 68 - 73 could be replaced by LDB -3 , U: 


Table 14.2 The hard_09.h header file. 


#define 

ANAL0G_X 

(unsigned 

char 

*) 

0x2000 

/* 

Analog output to X amplifier 

V 

#define 

ANAL0G_Y 

(unsigned 

char 

*) 

0x2001 

/* 

Analog output to Y amplifier 

V 

#define 

ANINPUT 

(unsigned 

char 

*) 

0x6000 

/* 

Analog input port at 6000h 

V 

#define 

SWITCH 

(unsigned 

char 

*) 

0x8000 

/* 

Digital input port at 8000h 

V 

#define 

Z_BLANK 

(unsigned 

char 

*) 

OxAOOO 

/* 

Digital output port at AOOOh 

V 

#define 

RAM_START 

(unsigned 

char 

*) 

0x0000 

/* 

6116 chip starts at location OOOOh 

V 

#define 

RAM_LENGTH 




0x800 

/* 

6116 byte capacity is 2K or 800h 

V 

#define 

R0M_START 

(unsigned 

short 

■ * 

)0xE0O0 

/* 

2716 chip starts at location EOOOh 

V 

#define 

R0M_LENGTH 




0x800 

/* 

2716 byte capacity is 2K or 800h 

V 

#define 

BLANK_0N 




OxFF 

/* 

Bit pattern to blank out beam 

*/ 

#define 

BLANK_0FF 




0 

/* 

Bit pattern to enable beam 

V 
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CLRA: LEAY D,Y: LDB _Array ,Y. As both processes are done in each loop pass, 
the savings are obvious. Table [T4~9l shows code where the regi ster qualifier is 
obeyed. 

ANSI C specifies that chars and shorts are promoted to i nts during process¬ 
ing (see Fig. 18.41 . Objects larger than bytes (chars) are handled with difficulty in 
most 8-bit processors. The Intermetrics/COSMIC 6809 C cross-compiler permits 
processing of chars in their byte form. Thus the assignment 1 eftmost = Oldest; 
is simply implemented in lines 54 and 55 using Accumulator_B only. However, 
the usefulness of this option is not fully realized in this particular instance, as i 
is a 16-bit object, and most arithmetic involves this variable. 


Table lT473l 6809 code resulting from Tables fM. 1 l and fl 4. 2\ ( continued next page). 


1 

Compilateur 

C pour MC6809 (COSMIC-France) 


2 



.list + 




3 



.psect _text 




4 

1 

/* Version 16/11/89 



V 

5 

2 

#include 

<hard_09.h> 




6 

1 

#define 

ANAL0G_X (unsigned 

char *)0x2000 /* Analog output to X amplifier 

*/ 

7 

2 

#define 

ANAL0G_Y (unsigned 

char *)0x2001 /* Analog output to Y amplifier 

V 

8 

3 

#define 

ANINPUT (unsigned 

char *)0x6000 /* Analog input port at 6000h 

V 

9 

4 

#define 

SWITCH (unsigned 

char *)0x8000 /* Digital input port at 8000h 

V 

10 

5 

#define 

Z_BLANK (unsigned 

char *)0xA00O /* Digital output port at AOOOh 

V 

11 

6 

#define 

RAM_START (unsigned 

char *)0x0000 /* 6116 chip starts at OOOOh 

V 

12 

7 

#define 

RAM_LENGTH 


0x800 /* 6116 byte capacity is 2K or 800h*/ 

13 

8 

#define 

R0M_START (unsigned 

short *)0xE000/* 2716 chip starts at EOOOh 

V 

14 

9 

#define 

R0M_LENGTH 


0x800 /* 2716 byte capacity is 2K or 800h*/ 

15 

10 

#define 

BLANK_0N 


OxFF /* Bit pattern to blank out beam 

V 

16 

11 

#define 

BLANK_0FF 


0 /* Bit pattern to enable beam 

V 

17 

3 

unsignec 

char Array [256]; 


/* Global array holding display data 

V 

18 

4 

unsignec 

char Oldest; /* Index to the Oldest inserted data byte (left point 

on) 

19 

5 






20 

6 

mainO 





21 

7 

{ 





22 

E00D 

3440 _main: pshs u 


## Open a frame 


23 

E00F 

33E4 

leau ,s 


## U is the Top Of Frame (T0F) 


24 

E011 

3277 

leas -9,s 


## Nine bytes deep 


25 

8 

register 

short int i; 


/* Scan counter 

*/ 

26 

9 

register 

unsigned char leftmost; /* The initial array index when x is 0 

V 

27 

10 

unsignec 

char * const x = ANAL0G_X;/* x points to a byte @ (address) ANALOG. 

_x*/ 

28 

E013 

CC2000 

Idd #2000h 


## Put constant 2000h in frame at FP-5/-4 


29 

E016 

ED5B 

std -5,u 




30 

11 

unsignec 

char * const y = ANAL0G_Y;/* y points to a byte @ (address) ANALOG. 

_Y*/ 

31 

E018 

CC2001 

Idd #2001h 


## Put constant 2001h in frame at FP-7/-6 


32 

E01B 

ED59 

std -7,u 




33 

12 

unsignec 

char * const z = Z_ 

.BLANK; /* The z-mod port (digital port) 

V 

34 

E01D 

CCA00O 

Idd #0A000h 


## Put constant AOOOh in frame at FP-9/-8 


35 

E020 

ED57 

std -9,u 




36 

13 

Oldest = 

0; 


/* Start New index at beginning of the array */ 

37 

E022 

7F0001 

clr _01dest 




38 

14 

for(i=0; 

i<256;i++) 


/* Clear array 

*/ 

39 

E025 

4F 

cl ra 




40 

E026 

5F 

cl rb 




41 

E027 

ED5E 

std -2,u 


## i lives in FP-2/-1; is cleared, i=0 


42 

E029 

AE5E LI: Idx -2,u 


## Get i into X 


43 

E02B 

8C0100 

cmpx #256 


## i<2 56? 


44 

E02E 

2C0C 

jbge L14 


## IF not THEN jump out of for loop 
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Table l~R~H 6809 code resulting from Tables lT4. 1 1 and PI 4.2\ (continued next page). 


45 

; 15 

{Array[i]=0;} 





46 

E030 

6F890002 

cl r 

.Array,x 

## 

EA is Array[0]+i , clear it 


47 

E034 

6C5F 

i nc 

—1, u 

## 

Double-precision increment of 16-bit int i, 

i++ 

48 

E036 

2602 

jbne 

L4 




49 

E038 

6C5E 

i nc 

-2, u 




50 

E03A 

20ED 

jbr 

LI 




51 

; 16 

while(1) 




/* Do forever display contents of array 

V 

52 

; 17 

{ 






53 

; 18 

leftmost = Oldest;/* Make leftmost point on the screen the oldest sample*/ 

54 

E03C 

F60001 L4: 

ldb 

.Oldest 

## 

Put Oldest array index 


55 

E03F 

E75D 

stb 

-3, u 

## 

in FP-3/-2, where leftmost lives in the frame 

56 

; 19 

for 

i=0; i<256; i++) 




57 

E041 

4F 

cl ra 





58 

E042 

5F 

cl rb 





59 

E043 

ED5E 

std 

-2, u 

## Again i=0 


60 

E045 

AE5E L16 

: ldx 

-2, u 

## 

Get i into X 


61 

E047 

8C0100 

cmpx 

#256 

## 

i<256? 


62 

E04A 

2C1C 

jbge 

L17 

## 

IF not THEN jump out of for loop 


63 

; 20 


{ 





64 

; 21 


*x = (unsigned char)i 

/* Send x co-ordinate to X plates 

*/ 

65 

E04C 

EC5E 

ldd 

-2, u 

## 

Get i into D 


66 

E04E 

E7D8FB 

stb 

[-5,u] 

## 

Put lower byte (char) indirectly into X D/A 


67 

; 22 


*y = Array[(leftmost+i)&0x0ff];/* and the display byte to the Y D/A*/ 

68 

E051 

E65D 

ldb 

-3, u 

## 

Get leftmost out of the frame into B 


69 

E053 

4F 

cl ra 


## 

extended to 16 bits (int) 


70 

E054 

E35E 

addd 

-2, u 

## 

Add to int i; leftmost+i 


71 

E056 

4F 

cl ra 


## 

Neat way of ANDing with 0000 0000 1111 1111b! 

72 

E057 

1F01 

tfr 

d,x 

## 

X holds (leftmost+i)&0xff 


73 

E059 

E6890002 

ldb 

Array, x 

## 

EA is Array[0]+(leftmost+i)&0xff; get element 

74 

E05D 

E7D8F9 

stb 

[-7,u] 

## 

Put Array[(leftmost+i)&0xff] indirectly into 

Y 

75 

; 23 


} 





76 

E060 

6C5F 

i nc 

-1, u 

## 

Double-precision increment of 16-bit int i, 

i ++ 

77 

E062 

2602 

jbne 

L6 




78 

E064 

6C5E 

i nc 

-2, u 




79 

E066 

20DD L6: 

jbr 

L16 




80 

; 24 

*z = 

BLANK_ON; 


/* Blank out for flyback 

*/ 

81 

E068 

C6FF L17 

: ldb 

#255 

## 

Send out indirectly 1111 1111b to Z 


82 

E06A 

E7D8F7 

stb 

[-9,u] 




83 

; 25 

*x = 

0; 



/* Move to right of screen 

*/ 

84 

E06D 

6FD8FB 

cl r 

[-5,u] 

## 

Send out indirectly OOh to X D/A; i.e. flyback 

85 

; 26 

*y = 

Array[01dest]; 


/* Y value at left of screen 

*/ 

86 

E070 

8E0002 

ldx 

#_Array 

## 

While this is happening get Array[01dest] 


87 

E073 

F60001 

ldb 

.Oldest 




88 

E076 

4F 

cl ra 





89 

E077 

E68B 

ldb 

d,x 




90 

E079 

E7D8F9 

stb 

[-7,u] 

## 

and put it indirectly into the Y D/A converter 

91 

; 27 

for(i 

=0;i<5;i++) {;} 


/* Delay 

*/ 

92 

E07C 

5F 

cl rb 





93 

E07D 

ED5E 

std 

-2, u 

## 

i =0 


94 

E07F 

AE5E L121: ldx 

-2, u 

## 

Get i into X 


95 

E081 

8C0005 

cmpx 

#5 

## 

i<5? 


96 

E084 

2C08 

jbge 

L131 

## 

IF not THEN jump out of for delay loop 


97 

E086 

6C5F L141: inc 

—1, u 

## 

Double-precision increment of 16-bit int i, 

i ++ 

98 

E088 

2602 

jbne 

L01 




99 

E08A 

6C5E 

i nc 

-2, u 




100 

E08C 

20F1 L01 

: jbr 

L121 




101 

; 28 

*z = 

BLANK_OFF; 


/* Blank off 

*/ 

102 

E08E 

6FD8F7 L131: clr 

[-9,u] 

## 

Send out indirectly 0000 0000b to Z 


103 

; 29 

} 






104 

E091 

20A9 

jbr 

L14 

## 

Do another scan; forever 


105 

; 30 

} 
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106 

107 

108 

109 

110 
111 
112 
113 


Table 14.3 (continued! 6809 code resulting from Tables [T4l] and \TT 2l 

! it ************ sV * * it ****************** it *************** it * * it * * * * * * ***** * * * it ******** * * 

* This is the NMI interrupt service routine which puts the analog sample in the 


ENTRY 

ENTRY 

EXIT 


Via NMI and startup 
Array[] and Oldest are global 
Value held at a_d in Array[01dest], 


Oldest incremented with wraparound 


* it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it 


void update(void) 


114 

; 39 

{ 





115 

E093 

3440 .update 

pshs 

u 

## Open a frame 

116 

E095 

33E4 

leau 

» S 

## 

With U as T0F 

117 

E097 

327E 

leas 

-2, s 

## 

of two bytes 

118 

; 40 

volatile unsigned char * const a_d = ANINPUT;/* This is the Analog input port*/ 

119 

E099 

CC6000 

Idd 

#6000h 

## 

to locate the constant 6000 

120 

E09C 

ED5E 

std 

-2, u 



121 

; 41 

Array[01dest++] = 1 

a_d; 


/* Overwrite oldest sample in Array[] & inc 

122 

E09E 

8E0002 

Idx 

#_Array 

## 

Point x to Array[0] 

123 

E0A1 

F60001 

Idb _ 

Oldest 

## 

Get Oldest 

124 

E0A4 

7C0001 

i nc _ 

Oldest 

## 

01dest++ 

125 

E0A7 

4F 

cl ra 


## 

16-bit 01dest++ 

126 

E0A8 

308B 

leax 

d,x 

## 

Point X to Array[0]+01dest++; ie Array[01dest++] 

127 

E0AA 

E6D8FE 

Idb 

[-2,u] 

## 

Get indirectly the contents of A/D; ie of 6000h 

128 

E0AD 

E784 L61: 

stb 

, x 

## 

Put it away as the latest entry into the array 

129 

; 42 

} 





130 

E0AF 

32C4 

leas 

, u 

## 

Close the frame 

131 

E0B1 

35CO 

pul s 

u, pc 



132 



public 

.update 



133 



public 

_mai n 



134 



psect 

bss 

## 

Space on RAM for 

135 

0001 .Oldest: 

byte 

[1] 

## 

the 1-byte (char) object Oldest 

136 



public 

Oldest 



137 

0002 .Array: 

byte 

[256] 

## 

and 256-byte array 

138 



public .Array 

## 

both of which are external, that is public 

139 



end 





Communication between the background function mai n () and interrupt func¬ 
tion updateO is handled via the two global objects Oldest and Array[], By 
defining these outside any function (lines C3 and C4) the compiler has placed 
their base labels _01 dest and _Ar ray (lines 134 - 138) in absolute memory. One 
byte has been reserved for the former and 256 for the latter. Both labels have been 
declared public, and thus are known to all, including files compiled/assembled 
externally. Both objects are in the _bss program sector, which is used by this 
compiler for static and extern data with no initial values. C specifies that 
these should be pre-initialized by default to zero, and as they lie in RAM, this 
should be done by the startup routine. However, in this instance I have chosen 
explicitly to clear them at the C level in mai n() at lines C13 -C15. 

The linker has been configured to commence the _bss section at 0001 h (as 
location OOOO/i, the null pointer, should never be used), which locates _01dest 
at 0001 h and _Array at 0002 h. Similarly _text begins at E000h. The program 
shown in Table fhOI however, commences at EOOD/t. The missing 12 bytes are 
taken up by the startup routine, which is an assembly-level routine linked in 
before the compiled file. 
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Table 14.4 The 6809 Time Compressed Memory Startup. 

1 .processor m6809 

2 ; 

3 ; C STARTUP FOR 6809 Time Compressed Memory 

4 ; With primitive interrupt handling and no initialization of statics & globals 


5 



. external 

_mai n, 

_update 

6 



.public 

_exit, 

NMI, start 

7 

E000 

10CE0400 start: 

Ids 

#0400h 

; Set Stack Ptr to top of 6116 RAM 

8 

E004 

BDE00D 

jsr 

_mai n 

; Execute main() 

9 

E007 

20F7 _exit: 

bra 

start 

; IF return THEN repeat 


10 ; Now follows the NMI stub leading to update() 

11 ; It is reached from the vector table 


12 E009 BDE093 

NMI: 

: jsr 

.update 

; Go to update() 

13 E00C 3B 


rti 


; On return terminate service 

14 


. end 



(a) Startup code. 





1 ; Table of vectors, 

all 

point Into 

the startup module 

2 


.processor 

m6809 


3 


.list 

+.text 


4 


.psect 

_text 

; Link In at E7F6h for a 2716 

5 


.external 

NMI, 

start 


6 ; The NMI service routine stub is in the startup routine, as is start 

7 E7F6 E000 .word start, start, start, NMI, start 

8 E7F8 E000 

9 E7FA E000 

10 E7FC E009 

11 E7FE E000 

12 ; Five vectors; namely FIRQ, IRQ, SWI, NMI, Reset. All go to the start 

13 ; except NMI, which goes to the NMI stub 

14 .end 

(b) Vector table linked in after the C code. 

The startup routine, shown in 'Fable ITT-fl a) has three functions. The first is 
to set the Stack Pointer to the top of the System stack. Hence in line 7 I have 
put this at the top of the 6ii6 RAM. If the library routines malloc()Q] (Memory 
ALLOCate) and other related functions are being used, then this can be lowered 
somewhat and memory above used as a general storage pool (called the heap). 

The second purpose of this startup routine is to go to the main C function. 
This is implemented as a simple JSR _mai n in line 8. In this case, startup.s 
does not pass any parameters to mai n(). mai n() is an endless loop and so no 
return should occur, but if it does, a skip back to the beginning is actioned. The 
re-entry point is labelled _exi t, and can be reached from the C level by calling the 
library routine exit(). exit() is supposed to return True or False to indicate 
an error condition, but no use is made of this in our situation. 

The final function deals with NMI interrupt handling. Function update () is 
terminated with a Return From Subroutine operation (implemented in line 131 
of Table fl43l with a PULS PC) and therefore cannot be directly entered from an 
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Table 14.5 The machine-code file for the 6809-based time-compressed memory. 
e093 _update 
eOOd _main 
e009 NMI 
e007 _exit 
eOOO start 
0001 .Oldest 
0002 .Array 

e0b8 _prog.top 

0001 _data_top 

0102 _stack_bottom 

0400 _stack_top 

e7f6 a:vecttcm9.o 
$ 


20E0000010CE0400BDEOOD20F7BDE0933B344033E43277CC2000ED5BCC2001ED59CCAOOOEB 

20E02000ED577F00014F5FED5EAE5E8C01002C0C6F8900026C5F26026C5E20EDF60001E7B0 

20E040005D4F5FED5EAE5E8C01002C1CEC5EE7D8FBE65D4FE35E4F1F01E6890002E7D8F91A 

20E060006C5F26026C5E20DDC6FFE7D8F76FD8FB8E0002F600014FE68BE7D8F95FED5EAED2 

20E080005E8C00052C086C5F26026C5E20F16FD8F720A9344033E4327ECC6000ED5E8E0048 

18EOAOOOO2F6OO017CO0O14F3O8BE6D8FEE78432C435CO3B32C435C0BO 

OAE 7 F 6 OOEOOOEOOOEOOOEOO 9 EOOOBO 

O3EOB8OOE0BB0OCA 

00E000011F 


interrupt. Instead, the startup routine has a stub in lines 12 and 13, which is la¬ 
belled NMI. This stub simply jumps to update () (I SR _update) and terminates 
on return with RTI. If the address NMI is placed in the NMI vector, then on re¬ 
ceiving such an interrupt, the processor will jump to NMI (E009/r here) and again 
jump to update (). The way back is a similar RTS-RTI double hop. As all registers 
are saved on entry, no other action need be taken. 

The vector routine of Table fTOl bl is linked in after the C code and begins at 
E7F6h, which is the FIRQ vector in the 2716 EPROM. All vectors are specified to 
point to the beginning of the startup routine (EOOO/r ), except the NMI vector. The 
addresses start and NMI have been broadcast by the startup routine as public 
and declared external by the vector routine. 

The end production of the compilation/assembly and linkage of these three 
files is the Intel-format machine-code file of Table frOI This is used as the input 
to the EPROM programmer or in-circuit emulator. In total there are 178 bytes of 
EPROM text plus the ten Vector bytes. 

The double-hop interrupt handling technique will work with any compiler. 
However, most compilers specifically designed to produce ROMable code support 
extensions to the ANSII standard, enabling the user to declare a function as an 
interrupt handler (See Section 10.2). The function name, in our case update, is 
then entered into the Vector table directly in the normal way. This direct entry 
should decrease the response time to an interrupt and at the same time reduce 
the code emitted by the compiler. 

This particular compiler uses the directive Oport to designate a function in 
t hi s way, the function header then becoming @port updateQ. It is instructive 
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106 


141 

142 


/* 


Table 14.6 The ©port directive. 


* ** * *** ****** *** *** * ** **************************** * * * * * * * * * * * * * * * * * * * * * * * * 


107 

32 

4 This is the NMI 

i interrupt 

service routine which puts the analog sample in 

108 

33 

* ENTRY 

Via NMI 

and startupl 


* 

109 

34 

4 ENTRY 

Array[] 

and Oldest are global 

* 

110 

35 

4 EXIT 

Value held at a_d 

in Array[01dest], Oldest incremented with wrap 

111 

36 

***************** 

**********************************************************/ 

112 

37 







113 

38 

©port updateO 





114 

39 

{ 






115 

E093 

BDE0B2 

.update: 

jsr 

c_cstk 

## 

Save registers if FIRQ 

116 

E096 

33E4 


leau 

, s 

## Open a frame 

117 

E098 

327E 


leas 

-2, s 

## 

With U as T0F 

118 

E09A 

8D03 


jbsr 

L02 

## 

Do the core code 

119 

E09C 

3262 


leas 

2, s 

## 

Close frame 

120 

E09E 

3B 


rti 


## 

Return from interrupt 

121 

40 

volatiie 

unsigned 

char 4 const a_d = 

ANINPUT;/ 4 This is the Analog input port 4 

122 

E09F 

CC6000 

L02: 

ldd 

#6000h 

## 

The constant 6000h put into frame 

123 

E0A2 

ED5E 


std 

-2, u 



124 

41 

Array[01dest++] = 

4 a_d; /* Overwrite 

oldest sample in Array[] & inc Oldest idx 

125 

E0A4 

8E0002 


ldx 

#_Array 

## 

Point x to Array[0] 

126 

E0A7 

F60001 


ldb 

.Oldest 

## 

Get Oldest 

127 

E0AA 

7C0001 


i nc 

.Oldest 

## 

01dest++ 

128 

E0AB 

4F 


cl ra 


## 

16-bit 01dest++ 

129 

E0AC 

308B 


leax 

d,x 

## 

Point X to Array[0] + 01dest++ 

130 

E0AE 

E6D8FE 


ldb 

C-2,u] 

## 

Get contents of A/D, that is of 6000h 

131 

E0B0 

E784 

L61: 

stb 

, x 

## 

Put it away as the latest entry 

132 

42 

} 






133 

E0B1 

39 


rts 

;## 

Return to stub above 

134 




.public 

_update 



135 




.public 

mai n 



136 




.psect 

bss 

## 

Space on RAM for 

137 

0001 


.Oldest: 

. byte 

[1] 

## 

the 1-byte (char) object Oldest 

138 




.public 

Oldest 



139 

0002 


_Array: 

. byte 

[256] 

## 

and 256-byte array 

140 




.public 

_Array 

## 

both of which are external 


.external 

.end 


c_cstk 


(a) Resulting code. 


0xe0a9 

6d62 c_cstk: 

tst 

2, s 

## 

Check E flag 

OxeOab 

2bl3 

bmi 

OxeOcO 

## 

IF 0 THEN forget about the rest 

OxeOad 

327f 

1 eas 

-1, s 

## 

ELSE make a copy on Stack 

OxeOaf 

341f 

pshs 

cc,d,dp,x 

## 

of the registers used by compiler 

OxeObl 

a669 

Ida 

9, s 

## 

in the correct order 

0xe0b3 

8a80 

ora 

#0x80 

## 

setting E = 1 

0xe0b5 

a7e4 

sta 

0, s 

## 

to mimic a IRQ/NMI type Stack 

0xe0b7 

ae67 

ldx 

7, s 



0xe0b9 

10af66 

sty 

6, s 



OxeObc 

ef68 

stu 

8, s 



OxeObe 

6e84 

jmp 

0, x 

## 

Return without altering SP 

OxeOcO 

39 

rts 

;## 

Exit point for IRQ/NMI type interrupt 


(b) A disassembly of the library routine c_cstk. 


to look at the code produced, which is shown in Table 114.61 a). Here we can 
see the RTI in line 120, but also the RTS in line 133. What has happened, is 
that the original code has been cocooned by the RTI at the end and a library 
subroutine c_cstk at the beginning. As you will remember, the 6809 has three 
interrupt inputs: NMI, IRQ and FIRQ. The two former save all internal registers 
on the System stack and retrieve them, whilst the latter saves only the CCR and 
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Table 14.7 Using _asm() to terminate a NMI/IRQ type interrupt service function. 

■ 3i/***************************************** * * * * * **************************************** 

; 32 * This is the NMI service routine which puts the analog sample in the array and update 

; 33 * ENTRY : Via NMI and startup 

; 34 * ENTRY : Array[] and Oldest are global 

; 35 * EXIT : Value held at a_d in Array[01dest], Oldest incremented with wraparound @ 256 

■ 30 ******************************************* * ****************************************** 

■ 37 

; 38 void update(void) 

; 39 { 

_update: pshs u 

leau ,s 
leas - 2 ,s 

; 40 volatile unsigned char * const a_d = ANINPUT; /* This is the Analog input port */ 
ldd #6000h 
std - 2 ,u 

; 41 Array[01dest++] = *a_d;/* Overwrite oldest sample in Array[] and inc Oldest index */ 


ldx 

#_Array 

ldb 

_01dest 

i nc 

_01dest 

cl ra 


1 eax 

d, x 

ldb 

[-2 ,u] 

stb 

, x 


; 42 _asm("LEAS ,U\nPULS U,PC\nRTI ; Wrap up frame and return to main \n"); 


LEAS 

,u 

;## 

Three inserted assembler-level instructions 

PULS 

U,PC 

;## 

to wrap up frame 

RTI 


;## 

and return from interrupt 

1 eas 

, u 

;## 

These 2 instructions are now dead code; ie never entered 

pul s 

u, pc 




.public _update 
.public _main 
.psect _bss 
_01dest: .byte [1] 

.public _01dest 
_Array: .byte [256] 

.public _Array 
.end 


PC. c_cstk, shown disassembled (see page 13 8 5) in Table []A6jb), first checks the 
E flag. If E is clear then a NMI or IRQ interrupt service is in progress and nothing 
further needs doing. If not, the E flag is cleared and all the registers are put into 
the System stack to pretend that the FIRQ is really an IRQ/NMI type interrupt. 

Table ll4.6l shows us that although @port is deceptively simple at the C level, it 
neither improves the speed nor reduces the size of the resulting code. Knowing, 
as we d o, tha t updateO is entered via an NMI, we could simply alter line 131 
of Table 1X01 in its source form to PULS U : RTI, before letting it through to the 
assembler. This is messy, and as an alternative the function: 

_asm ("LEAS, U\n PULS U\n RTI\n") 

is used to insert three assembly-level instructions. The first two close the frame, 
whilst an RTI terminates the interrupt routine. This is shown in Table 114.71 
line C42. The principle could be extended to FIRQ by saving registers at the 
beginning, and pushing them out at the end. Incidentally, _asm() could also be 
used to implement the startup routine as a front to the C code. 
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All three approaches are non-portable and error prone, so in the majority of 
cases a stub approach is best, if rather slow. The @port solution gives 192 bytes 
whilst using _asm() yields 179 bytes. Creatively editing the source file is the 
most efficient of all, giving a total of 175 bytes. These figures take account of the 
removal of the stub from startup, s, but not the vector table. Creative editing, 
whilst being efficient, is the most dangerous, as it does not show up in the source 
of any of the constituent files, and, unless extremely well documented, will cause 
havoc if any but the original designer tries to make subsequent changes. 

We will compare the resulting machine code to a hand-assembled version in 
Section 16.1, but the question must be asked here: can the resulting machine 
code be reduced in size, knowing the way the compiler produces such code. 

Two possibilities spring to mi nd. As we have said the 6809 does not handle 
16-bit quantities with any finesse. If we could use a char-sized i, instead of 
short, a considerable economy should be achieved. This can be done, if rather 
inelegantly, by defining i as unsi gned char and replacing the statement: 

for (i=0; i<256, i++) 

{body;} 

by: 

i = 0; 

do 

{ 

body; 
i++; 

} whi1e (i !=0) ; 

Here i will be 1 after the first pass, and the while argument will be True. When i 
reaches 255, then i ++ will wrap around to 0 and the whi 1 e argument will return 
False, causing the do...whi 1 e loop to exit. 

This structure is of course only relevant to loops of 256 iterations on an 8-bit 
machine, and presupposes an 8-bit char. 

A further reduction can be obtained if the compiler's treatment of pointer 
constants, such as a_d in lines 26-33 of Table [TOl is studied. There are four 
such constants in our program, and each is put into the frame on entry to the 
function, for example: 

119 LDD #6000h ; the constant a_d 

120 STD -2,1) ; in the frame at T0F-2 and TOF-1 

Once in the frame they can be used as a pointer via Indirect addressing, for in¬ 
stance [-2 , U] = 6000h. With mai n() this stack initialization is done only once 
on entry, and execution proceeds to the core endless loop. The same setup oc¬ 
curs on each entry to update(); however, this will happen around 128 times per 
second! 

It is not necessary to store constants in the essentially dynamic frame; it is 
better to use absolute locations. This can be done by defining such pointers as 
stati c; for example: 
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static volatile unsigned char 


const a_d = ANINPUT; 


which reads a_d is a const pointer/ to a volatile unsigned char/ is stored 
statically/ and has an initial value of ANINPUT (i.e. 6000 h). The combination of 


Table lT~4T8l Optimized 6809 code (continued next page). 
pour MC6809 (COSMIC-France) 


; Compilateur C 

.list + 

.psect _text 

; 1 /* Version 07/12/89 */ 

; 2 #include <hard_09.h> 

L5_x: .word 2000h ;## 1st word in the txt sect (ROM) holds the pointer constant 2000h 
L51_y: .word 2001h ;## Next in absolute memory is the constant 2001h (Y amplifier port) 
L52_z: .word OAOOOh ;## and AOOOh the Z-blank port 

3 unsigned char Array [256]; /* Global array holding display data */ 

4 @dir unsigned char Oldest; /* Index to the Oldest inserted data byte (left point on 

5 

6 main() 

7 { 

main: pshs u 

leau ,s 
leas -2,s 

8 unsigned char i; 

9 unsigned char leftmost; 


/* Scan counter 
/* The initial array index when x is 0 


; io 

static 

unsigned char * 

const x = ANAL0G_X; /* x points to 

a byte @ 

(address) 

ANAL0G_X 

; 11 

static 

unsigned char * 

const y = ANAL0G_Y; /* y points to 

a byte @ 

(address) 

ANALOG Y 

; 12 

static 

unsigned char * 

const z = Z_BLANK; /* The z-mod port (digital port) 

*/ 

; 13 

Oldest 

= 0; 

/* Start New index at 

beginning 

of the array */ 


cl r 

_01dest 





; 14 

i=0; 







cl r 

—1, u 





; 15 

do 


/* Clear array 



*/ 

; 16 

{Array[i]=0; i++;} while(i!=0); 




LI: 

ldx 

#_Array 

;## First do the body statements 





ldb 

-1, u 

;## i is now a char; and is at U-l 





cl ra 



cl r 

d,x 





inc 

-l,u ;## i++ 





Ida 

-l,u ;## Then do the test i 

! = 0 




jbne 

LI 




; 17 

while(l) 

/* Do forever display contents of 

array 

V 

; is 

{ 





; 19 

leftmost = Oldest; /* Make the leftmost 

point on the screen the 

oldest 

sample 

L13: 

ldb 

_01dest 





stb 

-2, u 




o 

INI 

i = 

0; 





cl r 

-l,u 




; 21 

do 




; 22 


{ 




; 23 


*x = (unsigned char)i; /* Send x 

co-ordinate to X plates 


V 

L15: 

ldb 

—1, u 





stb 

[L5_x] 




; 24 

cl ra 

*y = Array[(leftmost+i)&0x0ff]; /* 

and the display byte to 

the y 

D/A */ 


addb 

-2, u 





rola 






cl ra 
tfr 

d,x 





ldb 

Array,x 





stb 

[L51_y] 




; 25 

inc 

} while(++i1=0); 

-1,u ;## i++ 
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Table lT~4T8l Optimized 6809 code (continued next page). 




ldb 

-l,u ;## Once 

again note the test after the body is executed 




jbne 

L15 



; 26 


*z 

= BLANK_ON; 

/* Blank out for flyback 

*/ 



ldb 

#255 





stb 

[L52_z] 



; 27 


*x 

= 0; 

/* Move to right of screen 

*/ 



cl r 

[L5_x] 



; 28 


*y 

= Array[01dest]; 

/* Y value at left of screen 

V 



ldx 

#_Array 





ldb 

_01dest 





ldb 

d,x 





stb 

[ L 51—y ] 



; 29 


for(i=0; i<5; i++) {;} 

/* Delay 

*/ 



cl r 

-1, u 



L101 


ldb 

-l,u 





cmpb 

#5 





jbhs 

Llll 



L121 


i nc 

-1, u 





jbr 

L101 



; 30 


*z 

= BLANK_0FF; 

/* Blank off 

*/ 

Llll 


cl r 

[L52_z] 



; 31 


} 






jbr 

L13 



L53_a_d 

.word 

6000h ;## The 

static pointer constant 6000h is also stored in 

EPROM 

; 32 

} 





; 33 

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * -.V * * * * ****************************** 

******* 

; 34 

* 

This is 

the NMI interrupt service routine which puts the analog sample in the 

array 

; 35 

* 

ENTRY : 

Via NMI and startup 



; 36 

* 

ENTRY : 

Array[] and Oldest are 

global 


; 37 

* 

EXIT : 

Value held at a_d in Array[01dest], Oldest incremented with wraparound @ 256 


38 **************************************************************************************/ 

39 

40 void update(void) 

41 { 

42 static volatile unsigned char * const a_d = ANINPUT;/* This is the Analog input port*/ 

43 Array[01dest++] = *a_d;/* Overwrite oldest sample in Array[] & inc Oldest index modulo 
_update: ldx #_Array 

ldb _01dest 
inc _01dest 
cl ra 

leax d,x 
ldb [L53_a_d] 

L01: stb , x 

; 44 } 

rts 

.public _update 
.public _main 
_01dest: .psect zpage 
.byte [1] 

.public _01dest 
.psect _bss 
_Array: .byte [256] 

.public _Array 
. end 


;## Oldest is stored in page zero (direct page) at OOOlh 


;## Array is stored in the normal _bss area, starting @ OlOOh 


(a) Resulting assembly code. 
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Table 14.8 (continued) Optimized 6809 code. 


e073 

_update 

e013 

_mai n 

e009 

NMI 

e007 

_exit 

eOOO 

start 

0100 

_Array 

e084 

_prog_top 

0100 

_data_top 

0200 

_stack_bottc 

0400 

_stack_top 

e7f6 

a:vecttcm9.o 

0001 

$ 

_01dest 


:20E0000010CE0400BDE01320F7BDE0733B20O02001A000344033E4327E0F016F5F8E010083 
:20E02O00E65F4F6F8B6C5FA65F26F2D601E75E6F5FE65FE79FE00D4FEB5E494F1F01E68909 
:20E04O000100E79FE00F6C5F26E7C6FFE79FE0116F9FEO0D8E0100D601E68BE79FE00F6F80 
:20E060005FE65FC10524046C5F20F66F9FE01120BA60008E0100D6010C014F308BE69FE012 
:04E0800071E7843987 
:0AE7F600E000E000E000E009E000B0 
: 01E08C000093 

:08E08400E08C7A0001E08D0040 
:00E000011F 

(b) Executable code. 


the qualifiers stati c and const tells the compiler to put constants in ROM, that 
is the _text section. The compile-time nature of these constants is clearly seen 
in lines 6-8 of Table [TOl where they are placed in EPROM at locations E00D- 
E01 2 h. This saves 4x3 bytes and results in quicker execution (it also makes 
the code easier to read). Defining const pointers externally is an alternative to 
a stati c declaration, see Table 11531 Notice that updatef) no longer requires a 
frame. 

Defining Oldest to lie in zero page (with the @di r prefix in line C4) saves 
another few bytes, giving a total size of 132 bytes, plus vectors. Table [TO] uses 
the startup stub entry for the interrupt entry to updatef). A further few bytes 
may be saved at the expense of portability by using _asm(). 

Another possibility, not implemented in Table 114.81 is to replace the array 
representations in the three loops by equivalent pointer constructions. As these 
loops walk through the array, this procedure should be more effective (see Sec¬ 
tion 9.2). However, the saving is illusionary in this rather efficient compiler [21- 
See Table [T531 for an example of this technique. 


14.3 68008 - Target Code 

Although the target of Fig. ll3.3l is based on a 68008 processor, the hardware and 
address map were chosen to resemble that of the 6809 equivalent of Fig. 113.11 
This is reflected in the header file hard_68k. h included in Table 114.91 which is 
similar to hard_09. h. If there were changes in the memory map, then the header 
file would be suitably altered, whilst the remainder of the C code would remain 
unchanged (see also Section 15.2). Major changes in the input/output circuitry 
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could be handled by including the I/O functions appropriate to the hardware. In 
such cases the ma in body of C code still remains portable (see Section 10.4). 

A complete listing of 68000 assembly-level code intermingled with the original 
C source, as produced by the Intermetrics/COSMIC 68000 C cross-compiler V3.2, 
is given in Table |T4~9l 

I have added self explanatory comments, as indicated by the prefix *##, and 
thus this code will not be discussed in any detail in the text. There are, never¬ 
theless, some points which should be noted. First the regi ster variables i and 
1 eftmost have indeed been placed as requested in registers D5 [ 1 5:0] and D4[7:0] 
respectively. As i is a short variable, 16 bits have been reserved, whereas 8 bits 
is sufficient for char leftmost. 

ANSI C s pecif ies that chars and shorts are promoted to i nts during process¬ 
ing (see Fig. 18.41) . This can clearly be seen in lines 55-59, where the unsigned 
8-bit char 1 eftmost is added to the (signed) 16-bit short i . The former is first 
unsigned promoted to 32 bits (i.e. a 32-bit i nt) as follows: 

57 M0VEQ.L #0,D7 * A 32-bit clear 

58 MOVE.B D4,D7 * Lower 8 bits to D7; D7 = 000000|1eftmost 

Then the 16-bit i is sign extended: 

59 MOVE.W D5,D6 * 16-bit i to D6[15:0] 

60 EXT.L D6 * Sign extended to 32 bits; that is D6[31:0] 


Table fM~9l 68000 code resulting from Tables fR.l l and fl 4.21 (continued next page). 


~1WSL 3.0 as68k Sat Dec 02 14:13:38 1989 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 


/* Version 02/12/89 
#include <hard_68k.h> 

#define ANAL0G_X (unsigned char *)0x2000 /* Analog output to X amplifier */ 
#define ANAL0G_Y (unsigned char *)0x2001 /* Analog output to Y amplifier */ 
#define ANINPUT (unsigned char *)0x6000 /* Analog input port at 6000h */ 
#define SWITCH (unsigned char *)0x8000 /* Digital input port at 8000h */ 
#define Z_BLANK (unsigned char *)0xA00O /* Digital output port at AOOOh */ 
#define RAM_START (unsigned char *)0xE00O /* 6264 chip starts at location E000h*/ 


#define RAM_LENGTH 


0x2000 /* 6264 byte capacity is 8K or 2000h */ 


8 #define R0M_START (unsigned short *)0x0000/* 2764 chip starts at location 0000h*/ 


9 #define R0M_LENGTH 

12 *10 #define BLANK_0N 

13 *11 #define BLANK_0FF 

3 unsigned char Array [256]; 

4 unsigned char Oldest; 


14 

15 

16 * 5 

17 * 6 mainO 

18 * 7 { 

19 

20 

21 00418 4e56 fff4 _main: 

22 0041C 48e7 OcOO 

23 * 8 register short int 


0x2000/* 2764 byte capacity is 8K or 2000h */ 
OxFF /* Bit pattern to blank out beam */ 

0 /* Bit pattern to enable beam */ 

/* Global array holding display */ 

/* Index to the Oldest inserted */ 


. text 
. even 

link a6,#-12 *## Frame of 3 words with A6 as FP (T0F) 

movem.l d5/d4,-(sp) *## D4/D5 not to be changed by any ftn 

/* Scan counter */ 

/* The initial array index when x is 0 */ 

/* x points to a byte @ (address) ANAL0G_X*/ 


24 * 9 register unsigned char leftmost; 

25 *10 unsigned char * const x = ANAL0G_X; 

26 00420 2d7c 00002000fffc move.1 #0x2000,-4(a6) *## Pointer'constant 2000h @ T0F -4/-1 

27 *11 unsigned char * const y = ANAL0G_Y; /* y points to a byte @ (address) ANAL0G_Y*/ 

28 00428 2d7c 00002001fff8 move.1 #0x2001,-8(a6) *## Likewise constant 2001h @ TOF-8/-5 


29 *12 unsigned char * const z = Z_BLANK; /* The z-mod port (digital port) 


30 00430 2d7c 0000a000fff4 move.l 

31 *13 Oldest = 0; 

32 00438 4239 OOOOeOOO clr.b 


*/ 


#0xa000,-12(a6)*## Likewise constant AOOOh @ TOF-12/-9 
/* Start New index at beginning of array */ 
.Oldest *## _01dest lives in absolute memory @ EOOOh 
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Table im^l 68000 code resulting from Tables PR. 1 1 and PI 4.2\ (continued next page). 


33 

*14 for(i=0; i<256; i++) 


/* 

Clear array 

*/ 

34 

0043E 

4245 

cl r.w 

d5 

*## 

D5(15:0) holds register short i 


35 

00440 

Oc45 0100 LI: 

cmpi.w 

#256,d5 

*## 

Is i beyond 255? 


36 

00444 

6c Oe 

bge.s 

L14 

*## 

IF yes THEN exit clear for loop 


37 

*15 

{Array[-i]=0;} 






38 

00446 

227c 0000e002 

move.1 

#_Array,al 

*## 

ELSE point Al to Array[0] each time thru 

39 

0044C 

4231 5000 

cl r.b 

(al,d5.w) 

*## 

Clear Array[i] 


40 

00450 

5245 

addq .w 

#l,d5 

*## 

i ++ 


41 

00452 

60 ec 

bra. s 

LI 

*## 

and repeat 


42 

*16 while(l) 


/* 

Do forever display contents of array 

*/ 

43 

*17 

{ 






44 

*18 

leftmost = Oldest; /* Make the leftmost 

point on screen the oldest sample 

*/ 

45 

00454 

1839 OOOOeOOO L14: 

move.b . 

.Oldest,d4 

*## 

Now make leftmost (in reg D4) = .Oldest 

46 

*19 

for (i=0; i<256; i++) 





47 

004 5A 

4245 

cl r.w 

d5 

*## 

i =0 


48 

0045C 

Oc45 0100 L16: 

cmpi.w 

#256,d5 

*## 

i>2 5 5 yet? 


49 

00460 

6c 2a 

bge.s 

L17 

*## 

IF yes THEN end scan for loop 


50 

*20 

{ 






51 

*21 

*x = (unsigned char)i; /* 

Send 

x co-ordinate to X plates 

*/ 

52 

00462 

226e fffc 

move.l 

~4(a6),al 

*## 

Get pointer constant 2000h (ie x) 

to Al 

53 

00466 

le05 

move.b 

d5,d7 

*## 

Move lower 8 bits of i into D7[7:0] 

54 

00468 

1287 

move.b 

d7, (al) 

*## 

and then send it to x 


55 

*22 

*y = Array[(1eftmost+i)&0x0ff] 

; /* 

and the display byte to the Y D/A 

*/ 

56 

0046A 

226e fff8 

move.1 

-8(a6),al 

*## 

Get pointer constant 2001h (y) to 

Al 

57 

0046E 

7e00 

moveq.1 

#0, d7 

*## 

Move 8-bit leftmost extended to 32 

l-bit 

58 

00470 

le04 

move.b 

d4,d7 

*## 

int to D7 


59 

00472 

3c05 

move.w 

d5,d6 

*## 

Get i to D6[15:0] 


60 

00474 

48c6 

ext. 1 

d6 

*## 

and extend to 32-bit int 


61 

00476 

de86 

add. 1 

d6,d7 

*## 

Add them in int form = 1eftmost+i 


62 

00478 

0287 OOOOOOff 

andi.1 

#255,d7 

*## 

Reduce to 8-bit (1eftmost+i)&0xff 


63 

0047E 

2447 

move.1 

d7,a2 

*## 

Put this array index in A2 


64 

00480 

d5fc 0000e002 

add.l 

#Jr ray,a2 

*## 

+ to Array gives address of Array[index] 

65 

00486 

1292 

move.b 

(a2),(al) 

*## 

Move to y 


66 

*23 

} 






67 

00488 

5245 

addq .w 

#l,d5 

*## 

i ++ 


68 

0048A 

60 dO 

bra. s 

L16 

*## 

and repeat scan 


69 

*24 

*z = BLANK_ON; 


/* 

Blank out for flyback 

*/ 

70 

0048C 

226e fff4 L17: 

move.l 

-12(a6),al 

*## 

Get constant pointer to z into Al 


71 

00490 

12bc OOff 

move.b 

#-l,(al) 

*## 

Send 1111 1111 to z 


72 

*25 

*x = 0; 


/* 

Move 

to right of screen 

V 

73 

00494 

226e fffc 

move.1 

-4(a6),al 

*## 

Get constant pointer to x into Al 


74 

00498 

4211 

cl r.b 

(al) 

*## 

x=0 


75 

*26 

*y = Array[01dest]; 

/* 

Y value at left of screen 

*/ 

76 

0049A 

226e fff8 

move.1 

-8(a6),al 

*## 

Al now points to y 


77 

0049E 

247c 0000e002 

move.l 

#Jr ray,a2 

*## 

A2 now points to Array[0] 


78 

004A4 

7e00 

moveq.1 

#0, d7 

*## 

Extend .Oldest to 32-bit int size 



79 

004A6 

le39 

OOOOeOOO move.b 

.Oldest,d7 



80 

004AC 

12b2 

7800 

move.b 

(a2,d7.1),(al)*## Send array[01dest] to y 


81 

*27 

for(i=0; 

i<5; i++) {;} 

/* 

Delay 

*/ 

82 

004B0 

4245 


cl r.w 

d5 

*## i=0 


83 

004B2 

0c45 

0005 

L121: cmpi.w 

#5,d5 

*## i<5? 


84 

004B6 

6c 

04 

bge.s 

L131 

*## IF yes THEN exit from delay 

for loop 

85 

004B8 

5245 


L141: addq.w 

#l,d5 

*## ELSE i++ 


86 

004BA 

60 

f6 

bra. s 

L121 

*## and repeat 


87 

*28 

*z 

= BLANK.OFF; 

/* 

Blank off 

*/ 

88 

004BC 

226e 

fff4 

L131: move.l 

-12(a6),al 

*## Al now points to z 


89 

004CO 

4211 


cl r.b 

(al) 

*## Send 0000 0000 to z 


90 

*29 

} 






91 

004C2 

60 

90 

bra. s 

L14 

*## Repeat the complete scan 



92 *fnsize=86 

93 *30 } 


## Repeat the complete scan 
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Table 14.9 (continued) 68000 code resulting from Tables Udl] and [l4~2\ 

.94 *31 /********************************************************************************* 

95 *32 * This is the NMI interrupt service routine which puts the analog sample in the 

96 *33 * ENTRY : Via NMI and startup 

97 *34 * ENTRY : Array[] and Oldest are global 

98 *35 * EXIT : Value held at a_d in Array[01dest], Oldest incremented with wraparound 

99 *36 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * iV * * * * * iV * * iV * * iV * * * * * * * * * * * * * * * * * 

100 *37 

101 *38 void update(void) 

102 *39 { 

103 .even 


104 004C4 4e56 fffc _update: link a6,#-4 *## Make frame of 1 word for pointer const 

105 *40 volatile unsigned char * const a_d = ANINPUT; /* This is the Analog input port */ 


106 

004C8 

2d7c 

00006000fffc 

move.1 

#0x6000,-4(a6)*## Constant pointer ANINPUT in T0F- 

107 

*41 Array[01dest++] = *a_ 

-d; 

/* Overwrite oldest sample in Array[] & inc 

108 

004D0 

le39 

OOOOeOOO 

move.b 

_01dest,d7 

*## 

-Oldest to D7[7:0] 

109 

004D6 

5239 

OOOOeOOO 

addq. b 

#1,-Oldest 

*## 

_01dest++ 

110 

004DC 

0287 

OOOOOOff 

and. 1 

#255,d7 

*## Expand to 32-bit int 

111 

004E2 

2247 


move.1 

d7,al 

*## 

A1 now holds array index 

112 

004E4 

d3fc 

0000e002 

add. 1 

#_Array,al 

*## 

A1 now points to Array[01dest] 

113 

004EA 

246e 

fffc 

move.1 

-4(a6),a2 

*## 

A2 now points to ANINPUT 

114 

004EE 

1292 


move.b 

(a2),(al) 

*## 

Put [ANINPUT] into Array[01dest] 

115 

*42 } 







116 

004F0 

4e5e 


unt k 

a6 

*## 

Close up frame 

117 

004F2 

4e75 


rts 


*## 

and return 

118 




*fnsize=110 



119 




.globl 

-update 



120 




.glob! 

_mai n 



121 




. bss 




122 




.even 




123 

0EO00 


-Oldest: 

. = .+1 


*## 

Reserve one byte for char Oldest 

124 




.globl 

-Oldest 



125 




.even 




126 

OE002 


_Array: 

.=.+256 


*## 

Reserve 256 bytes for Array[256] 

127 




.glob! 

-Array 




no assembler errors 
code segment size = 220 
data segment size = 0 


After all this 32-bit ' fiddling around', the sum of these two is truncated by AND- 
ing with 1 11111 1 1 b (OxFF): 

61 ADD.L D6,D7 * 32-bit leftmost+i 

62 ANDI.L #0FFh,D7 * Truncated to 8-bits 

63 MOVEA.L D7,A2 * and moved as a 32-bit offset to A2.L 

It is clear from this discussion that nothing has been gained in making these 
two register variables char and short. Unlike its 6809 counterpart, no provision 
to buck the ANSII promotion requirement is provided by this compiler. We will 
return to this point later. 

Communication between the background mai n () (strictly voi d mai n (voi d)) 
and the interrupt function update 0 is handled via the two global objects, 01 dest 
and Array[]. By defining these outside any function (lines 14 and 15 in Ta¬ 
ble l l 4.91 . the compiler has placed their base labels _01dest and _Array in ab¬ 
solute memory (lines 123-127). One byte has been reserved for the former and 
256 for the latter. However, as Array[] is not a byte object, a hole of one byte 
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is left after _01 dest, to ensure that it starts at an even address (i.e. . EVEN). Both 
labels have been declared . CLOBL, and thus are known to all, through the linker. 
The two labels have been placed in the _bss program section (directive . BSS), 
which is used by this compiler for stati c and extern data with no initial values. 
C specifies that these should be load-time initialized by default to zero, and that, 
by inference, this should be done in the startup routine. However, in this instance 
I have chosen to do this at run time in the mai n () function at the C level, in lines 
33-41. 

The linker has been configured to commence the _bss sector at EOOOh, which 
locates _01 dest at EOOO/i and _Array at E002/i. Program section_text actually 
begins at OOOOh, but the startup routine vector table of Tab I e 11 4. 1 01 brings _mai n 
up above the vector table top (03FFh). 

The startup routine has three functions. The first is to place the initial System 
Stack Pointer address in locations 00000 - 00003 h and Reset address (i.e. initial 
Program Counter value) in 00004 -7h. In addition, the level-7 interrupt auto 
vector, which points to the startup NMI stub, is placed in 0007C - 0007Fh. Other 
vectors could of course be filled in the same manner (see Table ITTDit . Space is 
then reserved up to 003FFh. 

The startup program proper begins at 00400b. This has two purposes. The 
first is to go to the main C routine, which is implemented as a simple 1SR _mai n 
in line 12. No flags require changing in the Status register before this move, as we 
are remaining within the Supervisor state, and the initial Interrupt mask setting 
of 1 1 1 b still permits edge triggered non-maskable interrupts. In our situation, 
no parameters are passed (i.e. through the stack) to main(), and, as this is an 
endless loop, there should be no return. If there is, a move back to the beginning 
is actioned. This re-entry point is labelled _exi t, and can be reached from the 
C level by calling the ANSII library routine exit() 0. exit() is supposed to 
return True or False, to indicate an error condition, but no use is made of this in 
our implementation. 

The final function deals with the level-7 interrupt handler. The update() 
function is ter mi nated with RTS, in line 117 of Table 114.91 and so cannot be 
directly entered via an interrupt. Instead, the startup has a stub, in lines 14- 
17, which is labelled NMI. This address was placed in the vector table earlier in 
line 8. When a level-7 interrupt occurs, the processor goes to this stub. All that 
happens here is that the registers D7, A1 and A2 are pushed onto the System 
stack, and a subroutine Jump (JSR _update) is made to update (). The way 
back is a si mil ar double-hop, with update ()'s RTS returning the processor to 
the stub, the registers then being pulled off the stack followed by a terminating 
RTE. This compiler's house rule always preserves D3, D4, D5, A3, A4, A5 (all of 
which are used for register variables) and A6, A7 (the Frame and Stack Pointers) 
on return from a function. Thus a general interrupt stub need only save DO, D1, 
D2, D6, D7, A0, A1, A2. However, specifically update O only uses D7, A1 and A2. 

The linker places the startup code before the output from the C compiler, giv¬ 
ing the Intel-coded machine-code file of Table 114.111 This is used as the input 
to an EPROM programmer or in-circuit emulator. In total there are 244 bytes of 
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Table 14.10 The 68000 Time Compressed Memory Startup. 

1WSL 3.0 as68k Fri Dec 08 15:09:06 1989 

* Startup code for 68008-based Time-Compressed Memory 

* S.l.Cahill Version 07/02/89 
.text 


00000 
00004 
00008 

8 0007c 

9 

10 00080 
11 

12 00400 

13 00406 

14 00408 

15 0040c 

16 00412 

17 00416 

18 

19 

20 
21 


00010000 

00000400 

00 

00000408 

00 


STARTUP: 


. even 
. 1 ong 
. 1 ong 

. 1 ong 


0x10000 * Initial System Stack Pointer. 
START * The startup code below 
=.+116 * 116 bytes down to IRQ-7 vector 

NMI * IRQ7 service routine below 


The startup routine on reset follows 


4eb9 00000418 START 
60 f8 _exit 

48e7 0160 NMI 

4eb9 000004C4 
4cdf 0680 
4e73 


=.+896 * 896 bytes on at 400h 

jsr _main * Co to main(). 
bra.s START * IF returns THEN restart 
movem.l d7/al/a2,-(sp) * Save used regs 
jsr _update * Co to function updateO 
movem.l (sp)+,d7/al/a2 * Retrieve regs 
rte * return to caller 


Make 


.public _main, _update, _exit 

main(), updateO and _exit known to the linker (i.e. global) 


EPROM text (excluding the fixed vector table). It is interesting to compare this 
with the 178 bytes produced by the 6809 equivalent in the last section. Although 
there are less lines, 68000 instructions tend to be longer than their 6809 coun¬ 
terparts. 

The double-hop interrupt handling technique will work with any compiler. 
However, most compilers with aspirations to produce ROMable code, support 
extensions to the ANSII standard, enabling the user to declare a function as an 
interrupt handler (see Section 10.2). The function name, in our case _update, 
should then be placed directly in the vector table, rather than the stub label. This 
direct entry should decrease the response period to an interrupt, and at the same 
time possibly reduce the code emitted by the compiler. 

This compiler uses the directive @port to designate a function in this way, 
the function heading becoming @port updateO. The code produced by this 
stratagem is shown in Table 114.121 Here, four instructions have been inserted 
into the function code, lines 104- 107. These instructions are virtually identical 
to the stub of Table 114.101 but of course are directly entered at _update. In 
reality no time is saved, as the main body of updateO is unchanged, and is 
simply treated as a subroutine. Thus a double hop still occurs on entry and exit. 

Bonding with the updateO function code can be improved by eschewing the 
use of @port and using the (once again non-standard) _asm() function to insert 
the relevant assembly-level code, as shown in Table fiTTH Thus: 

_asm("movem.1 d7/a0/al,-(sp) * Save used regs on Stack\n"); 

after the opening brace and 
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Table 14.11 Machine-code file from Tables UA .Til and \J4. 1 01 

000004c4 _update 
00000418 _main 
00000000 STARTUP 
00000408 NMI 
00000406 _exit 
00000400 START 
00000440 LI 
0000048c L17 
0000045c L16 
00000454 L14 
000004b8 L141 
000004bc L131 
000004b2 L121 
0000e002 _Array 
OOOOeOOO _01dest 

000004f4 _prog_top 

OOOOeOOO _data_top 

0000el02 _stack_bottom 

00010000 _stack_top 

$ 

:20000000 0001000000000400 000000000000000000000000000000000000000000000000DE Stack/Reset 
:200020000000000000000000000000000000000000000000000000000000000000000000c0 vectors 
:200040000000000000000000000000000000000000000000000000000000000000000000A0 
:2000600000000000000000000000000000000000000000000000000000000000 00000408 74 Level-7 
:20008000000000000000000000000000000000000000000000000000000000000000000060 autovector 
:2000A000000000000000000000000000000000000000000000000000000000000000000040 
:2000C000000000000000000000000000000000000000000000000000000000000000000020 
:2000E000000000000000000000000000000000000000000000000000000000000000000000 
:200100000000000000000000000000000000000000000000000000000000000000000000DF 
:200120000000000000000000000000000000000000000000000000000000000000000000BF 
:2001400000000000000000000000000000000000000000000000000000000000000000009F 
:2001600000000000000000000000000000000000000000000000000000000000000000007F 
:2001800000000000000000000000000000000000000000000000000000000000000000005F 
:2001A00000000000000000000000000000000000000000000000000000000000000000003F 
:2001C00000000000000000000000000000000000000000000000000000000000000000001F 
:2001E0000000000000000000000000000000000000000000000000000000000000000000FF 
:200200000000000000000000000000000000000000000000000000000000000000000000DE 
:200220000000000000000000000000000000000000000000000000000000000000000000BE 
:2002400000000000000000000000000000000000000000000000000000000000000000009E 
:2002600000000000000000000000000000000000000000000000000000000000000000007E 
:2002800000000000000000000000000000000000000000000000000000000000000000005E 
:2002A00000000000000000000000000000000000000000000000000000000000000000003E 
:2002C00000000000000000000000000000000000000000000000000000000000000000001E 
:2OO2EOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOFE 
:200300000000000000000000000000000000000000000000000000000000000000000000DD 
:200320000000000000000000000000000000000000000000000000000000000000000000BD 
:2003400000000000000000000000000000000000000000000000000000000000000000009D 
:2003600000000000000000000000000000000000000000000000000000000000000000007D 
:2003800000000000000000000000000000000000000000000000000000000000000000005D 
:2003A00000000000000000000000000000000000000000000000000000000000000000003D 
:2003C00000000000000000000000000000000000000000000000000000000000000000001D 
:2OO3EOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOFD 

:200400004EB90000041860F848E701604EB9000004C44CDF06804E734E56FFF448E70C00BE Startup 
:200420002D7C00002000FFFC2D7C00002O01FFF82D7C0000A000FFF442390000E000424519 and main() 

:200440000C4501006C0E227C0000E0O242315O00524560EC18390000E00042450C450100A0 
:200460006C2A226EFFFC1E051287226EFFF87E001E043C0548C6DE860287000000FF2447D2 
:20048000D5FC0000E0021292524560D0226EFFF412BC00FF226EFFFC4211226EFFF8247CE9 
:2004A0000000E0027E001E390000E00012B2780042450C4500056C04524560F6226EFFF4AC 
:2004C000421160904E56FFFC2D7C00006000FFFC1E390000E00052390000E000028700000B updateO 
:1404EO0000FF2247D3FC0000E002246EFFFC12924E5E4E754F 
:00000001FF 


_asm("movem.1 (sp)+,d7/a0/al * Pull registers\n"); 

_asm("unlk a6\n rte * Close frame and exit \n"); 

at the close, tightly couples the additional interrupt code to the compiler-emitted 
code. Care must be taken to mi rror any registers pushed out on to the stack by 
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the compiler. _asm() could also, in principle, be used to implement the startup 
code as a front end to the C code. 

These latter two approaches are non-portable, and can be error prone. Thus, 
in the majority of cases a startup stub approach is best, if rather slow. If speed 
and/or space is extremely tight, then the compiler generated assembly-level file 
can be creatively edited. Thus any MOVEM instruction emitted by the compiler 
can be augmented with the registers not left untouched by the routine, and RTS 
replaced by RTE. But this approach is dangerous, as it does not show up in the 
source of any constituent file, and, unless extremely well documented, will cause 
havoc if any but the original designer tries to make subsequent changes. In any 
case, tinkering with intermediate files is not what compiling is all about. 

In the last section we were able to fine tune our C source file, knowing the char¬ 
acteristics of the target processor. The increase in speed and size is of course 
at the expense of portability. Can we do this for the 68000-target version? For 


~1WSL 3.0 as68k 


Table 14.12 77ie@port directive. 
Sat Dec 02 14:20:42 1989 


94 


31 /* 


* * * * * * * * * * * * * * * * * * * * * * V.' * * * * * * * * * * * * * * V.- * * ********** * * * * * * * * * * * * * * * * * * * ****** * * * * * 


95 * 32 * This is the NMI interrupt service routine which puts the analog sample in the 

96 * 33 * ENTRY : Via NMI and startup 

97 * 34 * ENTRY : Arrayf] and Oldest are global 

98 * 35 * EXIT : Value held at a_d in Array[01dest], Oldest incremented with wraparound 

gg * 36 ******************************************************************************** 


*## Co to update() proper 


100 

* 37 





101 

* 38 

©port updateO 



102 

* 39 

{ 




103 




. even 


104 

004B4 

48e7 

e3e0 _update: movem.l 

d0-d2/d6/d7/a0 

105 

004B8 

4eb9 

OOOOOObc 

jsr 

L 6 

106 

004BE 

4cdf 

07c7 

movem .1 

(sp)+,d 0 -d 2 /d 6 

107 

004C2 

4e73 


rte 


108 

004C4 

4e56 

fffc 

L 6 : link 

a6,#-4 

109 

* 40 

volatile unsigned 

char * const 

a_d = ANINPUT; 

110 

004C8 

2d7c 

00006000fffc 

move .1 

#0x6000,-4(a6) 

111 

* 41 

Array[01dest++] = 

*a_d; /* Overwrite oldest s 

112 

004D0 

le39 

OOOOeOOO 

move.b 

_01dest,d7 

113 

004D6 

5239 

OOOOeOOO 

addq.b 

#1,-Oldest 

114 

004DC 

0287 

OOOOOOff 

and. 1 

#255,d7 

115 

004E2 

2247 


move .1 

d7,al 

116 

004E4 

d3fc 

0000 e 002 

add. 1 

#_Array,al 

117 

004EE 

246e 

fffc 

move .1 

-4(a6),a2 

118 

004F0 

1292 


move.b 

(a 2 ),(al) 

119 

004F2 

4e5e 


uni k 

a 6 

120 

004F4 

4e75 


rts 


121 




*fnsize= 

110 

122 




.globl 

-update 

123 




.globl 

_mai n 

124 




. bss 


125 




.even 


126 

0E000 


_ 01 dest: . 

= .+l 

127 




.globl 

-Oldest 

128 




. even 


129 

0E002 


_Array: . 

=.+256 

130 




.globl 

-Array 


*## As Table 14.9 

/* This is the Analog input port*/ 
imple in Array[] and inc index */ 
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example, we have previously observed that the use of regi ster short and char 
objects is counterproductive, as such objects are extended to i nt during most 
arithmetic processes. Neither short i nor char 1 eftmost rely on modulo-256 
wraparound, so they can profitably be redefined as i nts to overcome this addi¬ 
tional processing. 

Most 68000-targeted compilers can be persuaded to define i nt as either a 16 
or 32-bit word. All previous examples have been based on 32-bit i nts. Using 16- 
bit i nts will speed up memory access and ALU processes. However, any address 
arithmetic, such as the calculation of the position of an array element, will require 
conversion to the 32-bit pointer size. 

Where constants are being stored, for example pointers to fixed hardware 
ports, it is not necessary to locate these values dynamically in the frame. We can 
see this run-time setup in line 110 of Table fl4H2l where the constant 6000 h (the 
address of the A/D) is put into the frame on each entry to update(). Constants 
are best stored in absolute locations, preferably in ROM along with the program 
text. In the case of constant pointers, this can be done by defining such objects 
as stati c, for example: 


Table 14.13 Using _asm() to terminate an interrupt sendee function. 

* ^ ^** * ****** **************************** * * *************** * * * * ************** * *********** * 

* 32 * This is the NMI interrupt service routine which puts the analog sample in the array 

* 33 * ENTRY : Via NMI and startup 

* 34 * ENTRY : Array[] and Oldest are global 

* 35 * EXIT : Value held at a_d in Array[01dest], Oldest incremented with wraparound @ 256 

* * * * * * ****************************** * * * * * * * * * * * **************************************** 

* 37 

* 38 void update(void) 

* 39 { 

. even 

_update: link a6,#-4 

* 40 volatile unsigned char * const a_d = ANINPUT; /* This is the Analog input port */ 

move.l #0x6000,-4(a6) 

* 41 _asm("movem.1 d7/a0/al,-(sp) * Save used regs on Stack \n"); 

movem.l d7/a0/al,-(sp) * Save used regs on Stack 

* 42 Array[01dest++] = *a_d; /* Overwrite oldest sample in Array[] & increment index */ 

move.b _01dest,d7 
addq.b #l,_ 01 dest 
and.1 #255,d7 

move.l d7,al 
add.l #_Array,al 
move.l -4(a6),a2 
move.b (a 2 ),(al) 

* 43 _asm("movem.1 (sp)+,d7/a0/al * Pull registers \n"); 

movem.l (sp)+,d7/a0/al * Pull registers 

* 44 _asm("unlk a 6 \n rte * Close frame and exit \n"); 


unl k 
rte 

a6 

* Close frame and 

exit 

unl k 

a6 

*## These two lines are now 

dead code 

rts 





*fnsize=125 

.glob! _update 
.globl _main 
. bss 
. even 

_01dest: . =.+l 

.globl _01dest 
. even 

_Array: . =.+256 

.globl _Array 
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static volatile unsigned char 


const a_d = ANINPUT; 


which reads a_d is a constant pointer/ to a vol ati 1 e unsi gned char/ is stored 
statically/ and has an initial value of ANINPUT (i.e. 6000/t from the header). The 
combination of static and const tells the compiler to put constants in ROM, 


Table fM.l 41 Optimized 68000-based code (continued next page). 

1 /* Version 08/03/90 


*/ 


* 2 

#include 


. text 


. even 

L5_x 

:: .long 


. even 

L51_ 

.y: .long 


. even 

L52_ 

z: .long 

* 3 

unsigned 

* 4 

unsigned 

* 5 


* 6 

main() 

* 7 

{ 


0x2000 *##The 1st tong word in text section (ROM) holds pntr constant 2000h 
0x2001 *## Next in absolute memory is the constant 2001h (Y amplifier port) 


OxaOOO *## and AOOOh, the Z-blank port 

r Array [256]; /* Global array holding display data 


V 


_main: movem .1 

d5/d4/a5,- 

(sp) 


* 8 

register unsigned char 

* array_prt; /* Pointer into array 

*/ 

* 9 

register int 

i ; 

/* Scan counter 

*/ 

*10 

register int 

leftmost; 

/* The initial array index when x is 0 

*/ 

*11 

static unsigned char * 

const x = ANAL0G_X; /* x points to a byte @ (addr) ANAL0G_X 

*/ 

*12 

static unsigned char * 

const y = ANAL0G_Y; /* y points to a byte @ (addr) ANAL0G_Y 

*/ 

*13 

static unsigned char * 

const z = Z_BLANK; /* The z-mod port (digital port) 

*/ 

*14 

Oldest = 0; 


/* Start New index at beginning of the array 

*/ 


clr.b 

.Oldest 



*15 

for(array_ptr=Array; array_ptr<Array+256; *array_ptr++ =0) {;} /* Clear array 

*/ 


move .1 

#_Array,a5 

*## Make array_ptr (in A5.L) point to bottom of Array 


LI: 

cmp. 1 

#_Array+256,a5 



bge.s 

L14 




move .1 

#_Array,al 

*## array_ptr < Array+256? 



bcc.s 

L14 

*## IF not then exit for loop 



cl r.b 

(a5) + 

*## ELSE clear array element & inc array pntr all at once 


bra. s 

LI 

*## and again 


*16 

while(l) 


/* Do forever display contents of array 

*/ 

*17 

{ 




*18 

leftmost = Oldest; 

/* Make the leftmost point on the screen the oldest sample 

*/ 


cl r.w 

d4 




move.b _ 

.Oldest,d4 



*19 

for (array_ptr=Array, i=0; array_ptr<Array+256;) 



move .1 

# Array,a5 

*## Again make array_ptr in A5.L point to bottom of array 


cl r.w 

d5 

*## i (in D5.W) = 0 


L16: 

cmp. 1 

#_Array+256,a5 *## array_ptr < Array+256? 



bcc.s 

L17 

*## IF not THEN exit for loop 


*20 

{ 




*21 

*x 

= (unsigned char)i; /* Send x co-ordinate to X plates 

*/ 


move .1 

L5 x,al 

*## L5_x in abs memory, thus abs mode used to get pntr to 

X 


move.b 

d5,d7 




move.b 

d7,(al) 



*22 

*y 

= *(array_ptr++ +1eftmost)&0xff; /* and the display byte to the y D/A 

*/ 


move .1 

L 5l_y,al 

*## Same for pntr to Y. Note pntr constants aren't in Stack 


move.w 

a5,a2 

*## Incrementing array_ptr for next time 



addq. 1 

#1, a5 




move.b 

(a2,d4.w),d7 *## array_ptr + leftmost into D7.B 



andi.b 

#0xff,d7 

*## Reduce to modulo-256 ( 8 -bit) 



move.b 

d7,(al) 

*## Put it out to Y port (address of which is in Al) 



*23 
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Table fM. 1 4t Optimized 68000-based code (continued next page). 



bra. s 

L16 

*## and again 


*24 

*z = BLANK_ON; 

/* Blank out for flyback 

*/ 

L17: 

move .1 

L52_z,al 

*## Use absolute addressing mode to get pointer to Z 



move.b 

# 0 xff,(al) 

*## Make Z port all Is 


*25 

*x = 0 ; 


/* Move to right of screen 

*/ 


move .1 

L5 x,al 

*## Move X back to start 



cl r.b 

(al) 



*26 

*y = Array[01dest]; 

/* Y value at left of screen 

*/ 


move .1 

L 5l_y,al 




move .1 

#_Array,a2 




moveq .1 

#0, d7 




move.b _ 

.Oldest,d7 




move.b 

(a2,d7.1),(al) 


*27 

for(i =0 

; i<5; i++) 

{;} /* Delay 

*/ 


cl r.w 

d5 



L 101 : 

cmpi.w 

#5, d5 




bge.s 

L131 



L 121 : 

addq .w 

#1, d5 




bra. s 

L121 



*28 

*z = BLANK-OFF; 

/* Blank off 

*/ 

Llll: 

move .1 

L52 z,al 




cl r.b 

(al) 



*29 

} 





bra. s 

L14 



*fnsize= 

=78 





. even 




L53_a_d: 

. long 

0x6000 

*## Pointer constant to A_D here in ROM 


*30 } 





*31 /************************ 

***************** * * * kkkk it it it it kit it k k it it kit it k it kkkkkkkkkkkkkkkk 

k k it k kit 


*32 * This is the NMI interrupt service routine which puts the analog sample in the array 

*33 * ENTRY : Via NMI and startup 

*34 * ENTRY : Array[] and Oldest are global 

*35 * EXIT : Value held at a_d in Array[01dest], Oldest incremented with wraparound @ 256 

* 3 6 *********************************** * * * * k * * * * * * * * * * * * * * * * * * * * * * * * k * * k * * k * * k * * * * * * * * * * * * 

*37 

*38 void update(void) 

*39 { 


. even 

*40 static volatile unsigned char 


*41 Array[01dest++] = *a_d; 


.update: move.b 
addq.b 
and. 1 
move .1 
add. 1 
move .1 
move.b 

*43 } 

rts 

*fnsize=99 

.globl 
.globl 
. bss 


_01dest,d7 
#1,-Oldest 
#255,d7 
d7,al 

#_Array,al 
L53_a_d,a2 
(a 2 ),(al) 


-update 
_mai n 


const a_d = ANINPUT;/* This is the Analog 
/* Overwrite oldest sample in Array[] & inc Oldest 
*## Notice, no frame is made for this function as 


.Oldest: 

.Array: 


= .+l 

.globl -Oldest 
. even 

=.+256 

.globl -Array 


i/p port*/ 
index */ 
no autos 


(a) Resulting assembly code. 
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Table 14.14 (continued! Optimized 68000-based code. 

000004bc _update 
00000424 _main 
00000000 STARTUP 
00000408 NMI 
00000406 _exit 
00000400 START 
00000434 LI 
000004b8 L53_a_d 

00000478 L17 

00000450 L16 

00000440 L14 

0000043c L12 

000004aa L121 

000004ae Llll 
000004a4 L101 

0000e002 _Array 
00000418 L5_x 
00000420 L52_z 

OOOOeOOO _01dest 
0000041c L51_y 

000004e0 _prog_top 

OOOOeOOO _data_top 

0000el02 _stack_bottom 

00010000 _stack_top 

$ 


200000000001000000000400000000000000000000000000000000000000000000000000DB 
200020000000000000000000000000000000000000000000000000000000000000000000C0 
200040000000000000000000000000000000000000000000000000000000000000000000A0 
20006000000000000000000000000000000000000000000000000000000000000000040874 
20008000000000000000000000000000000000000000000000000000000000000000000060 
2000A000000000000000000000000000000000000000000000000000000000000000000040 
2 OOOCOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 2 O 
2000E000000000000000000000000000000000000000000000000000000000000000000000 
200100000000000000000000000000000000000000000000000000000000000000000000DF 
2 OOI 2 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOBF 
2001400000000000000000000000000000000000000000000000000000000000000000009F 
2001600000000000000000000000000000000000000000000000000000000000000000007F 
2001800000000000000000000000000000000000000000000000000000000000000000005F 
2 OOIAOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 3 F 
2001C00000000000000000000000000000000000000000000000000000000000000000001F 
2001E0000000000000000000000000000000000000000000000000000000000000000000FF 
2 OO 2 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOODE 
2 OO 22 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOBE 
2002400000000000000000000000000000000000000000000000000000000000000000009E 
2002600000000000000000000000000000000000000000000000000000000000000000007E 
2002800000000000000000000000000000000000000000000000000000000000000000005E 
2002A00000000000000000000000000000000000000000000000000000000000000000003E 
2 OO 2 COOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOIE 
2 OO 2 EOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOFE 
2 OO 3 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOODD 
2 OO 32 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOBD 
2003400000000000000000000000000000000000000000000000000000000000000000009D 
2003600000000000000000000000000000000000000000000000000000000000000000007D 
2003800000000000000000000000000000000000000000000000000000000000000000005D 
2003A00000000000000000000000000000000000000000000000000000000000000000003D 
2 OO 3 COOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOID 
2 OO 3 EOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOFD 
200400004EB90000042460F848E701604EB9000004BC4CDF06804E7300002000000020014B 
200420000000A00048E70C0442390000E0002A7C0000E002BBFC0000E1026404421D60F445 
20044000424418390000E0002A7C0000E0024245BBFC0000E102642O2279000004181E05DE 
20046000128722790000041C244D528D1E324000020700FF128760D822790000042012BCE2 
2004800000FF227900000418421122790000041C247C0000E0027E00IE390000E00012B29D 
2004A000780042450C4500056C04524560F622790000042042116088000060001E390000D9 
2004C000E00052390000E0000287000000FF2247D3FC0000E0022479000004B812924E756F 
00000001FF 


(b) Executable code. 


that is the _text section. The compile-time nature of these constants is clearly 
seen in lines 3 - 9 of Table fR. 1 4[ where they are placed in the EPROM at locations 
0041 8-0041 F h. This saves four bytes for each of the four pointer constants. 
Furthermore, neither mai n() nor update () require a frame, as no auto variables 
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are used. 

Table ll4.74l shows the tuned version of our software. It differs from Table [l4~9l 
in the following respects: 

1. The compiler has been configured for a 16-bit i nt. This obviates conversion to 
32-bit for arithmetic processes, and suits the 16-bit ALU used by the 68000/8 
processors. However, it is a double-edged sword, in that pointers are still 32- 
bit, and the use of an i nt to generate an address (e.g. as an array index) will 
require a promotion (see code between C22 and C23). 

2. The register variables i and leftmost have been redefined as int types. 
This avoids conversion extensions in arithmetic processes. 

3. Pointer constants have been defined as static, which places them in ROM. 
The alternative of defining them externally (see Table 115.5) does the same 
thing. The compiler then uses absolute addressing to get these values (see 
code between lines C21 and C22). Some compilers (not this) have a small 
model mode where the Short Absolute address mode is used. Where this is 
available, two bytes are saved for each absolute access. 

With these alterations, the total size is now down to 224 bytes plus data 
storage and the fixed-size vector table. I have used the startup stub entry to 
update (), which is the most portable technique. A few bytes may be saved at 
the expense of this portability by direct editing of the assembly-level code. For 
comparison, a hand-assembled 68000-based version is given in Table 1711721 
Another possibility, not implemented here, is to use pointers to implement the 
three array handling loops. As these loops walk through the array, the process 
may be more efficient (see Section 9.2). However, with this compiler savin gs are 
minimal, as its array-handling code is quite efficient |2]. See Table 115.51 for an 
example of this technique. 
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CHAPTER 15 


Looking For Trouble 


The process from program text to binary bits in ROM has already been charted 
in Fig. 17.51 But what then? It would be naive to presume that the production of 
a machine-code file and programmed ROM is the end of the story. Just inserting 
this ROM into the target hardware, switching on and hoping for the best is unlikely 
to be productive of anything except frustration. Invariably testing and debugging 
t hi s software will take far longer than the writing of the original code (TJ. 

Testing involves executing the software in a controlled environment, to exer¬ 
cise the various responses to typical input stimuli. It is impossible to test every 
possible pathway in all but the simplest of routines, but a range of typical and 
boundary values should help and ensure that the program behaves properly. De¬ 
composing the program into functional modules helps to facilitate this process. 

Malfunctions are said to be caused by bugs (after an alleged incident where a 
moth was caught in a relay of an early electro-mechanical computer [?)). Bugs are 
normally found by applying a series of tests which focuses down onto the area 
of software (or hardware) which is exhibiting the erroneous behavior. Hardware 
testing and debugging is aided by a range of tools, varying in complexity from 
meters through to the logic analyser. Similarly, software debug tools are available 
in various levels of sophistication, to enable the tester to ' see into the works' 
while the program executes. 

The easiest scenario arises when a general-purpose computer is used to gen¬ 
erate (i.e. compile and assemble) code which it will itself subsequently run. Its 
many resources, such as operating system, VDU and keyboard, can be utilized 
with resident debug software to test the operation of the application software. 
Virtually total ignorance of the underlying hardware is possible. 

The situation is very different when the target system is a dedicated ROM- 
based stand-alone system, usually with a different processor to the code-generating 
computer. In this cross enviro nm ent (see Fig. l7.3l b)). gone is any resident debug 
software or superfluous peripheral devices. Interaction with the hardware at the 
machine code level looms large. Problems are compounded where a high-level 
language is used as the source, as the correlation between the executing code 
and source code is tortuous. At the time of writing, high-level simulators and 
emulators are the fastest growing area of cross software support. 

In this chapter we will look at some of the debug tools available for cross-target 
support. Our time-compressed memory will be used to illustrate the character¬ 
istics of these aids. 


383 


384 C FOR THE MICROPROCESSOR ENGINEER 


15.1 Simulation 

Given that a program has been written in a high-level language for another target, 
how is it to be tested? We have already observed that a naked target will carry no 
debug overhead to permit meaningful monitoring. Furthermore, it will execute 
(obviously) at machine-code level. 

A first approach to the problem is to use a native compiler running on the 
host machine. Thus, if an IBM PC is utilized as the development system, then use 
a compiler which produces native executable files. Obviously the environment of 
the host is very different to that of the naked target. However, gross algorithmic 
problems can often be eliminated using this technique. Function input/output 
parameters can be simulated by using operating system input/output functions. 
Table [III. 141 shows a simple example, where the sum_of_n function is emulated 
using keyboard input and VDU output. 

Monitoring high-level objects can be accomplished by using output functions 
to display or print their values. In Table [KTT41 aprintfO statement inside the 
loop would be suitable. Many native compilers, especially (but not exclusively) 
targeted to MSDOS 80x86 family hosts, can be run in conjunction with a debug 
package J3]|. Such packages allow the operator to watch a selection of variables as 
the program advances in a single-step or trace mode. Alternatively a breakpoint 
may be inserted (e.g. stop at line 26, or/and when Array [6] = 0), which permits 
high-speed operation to a predetermined point, at which time execution ceases 
and variables can be examined. 

The usefulness of this native technique is enhanced if the native and cross 
compilers belong to the same family of products. Several compiler vendors pro¬ 
vide suitable products, such as Aztec and the Intermetrics/COSMIC Whitesmiths 
group. In these circumstances, native and cross products usually share common 
characteristics, such as libraries. 

Where there is a great deal of interaction between software and hardware, 
native debugging is of limited use. This is particularly the case where the tar¬ 
get processor is different to the host. For example, a 68030 MPU-based Hewlett 
Packard workstation hosting a Z80-based target. Monitoring mac hi ne-level code 
will often be necessary to reveal the more subtle problems, especially where hard¬ 
ware interaction is involved. 

One way of tackling this problem is to use the host to simulate the target MPU OQ. 
Such a cross-simulator, sometimes known as a low-level symbolic debugger, is 
particularly of use in testing cross-assembler code. However, languages whose 
compilers produce assembly-level code, can also be tested in this manner. One 
major advantage of the use of a simulator is that no target hardware is involved. 
Thus the hardware and software design stages can stay apart longer. This takes 
the load off expensive equipment, such as an in-circuit emulator, which can then 
be used for the really obscure problems and final testing. By their nature, simula¬ 
tors cannot run in real time and they still leave a lot to be desired when interaction 
with hardware is problematical. 

Most simulators take their output from the linker in terms of the machine code, 
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location data and symbol tables. Part of the host's memory space is used to hold 
t hi s mac hi ne code, and the target's data memory space is likewise mapped. The 
major fa cil ities offered by a simulator are: 

Disassembly 

Displays the contents of simulated target memory as instruction mnemonics 
— a sort of reverse assembler. 

Register and memory examine/change 

To be able to exa mi ne any internal register(s) or memory location(s) and make 
necessary changes. 

Step execution 

To execute the target program one or more instructions at a time, usually 
displaying registers after each one. 

Trace execution 

Similar to the previous item, but as fast as can be displayed. 

Breakpoints 

Insertion of conditions, such as reaching a certain address, which causes exe¬ 
cution to pause or stop. 

Execute 

Similar to Trace, but as fast as the simulator can operate with no screen output. 
Normally stops when a breakpoint is encountered. 

The operation of a simulator is very much product specific. The COSMIC/Inter- 
metrics MIMIC range of simulators have been used for the following three exam¬ 
ples. 

Our first simulation is our old friend the sum of n integers, Table PklOl Ta¬ 
ble [Hr] is a log of a simulation session, with comments added later for clarity. 
After loading in the file, the process was: 

1. Disassemble program mnemonics from the beginning (e SUM_OF_N ore 0x400). 

2. Change DO to 0xFF0003 to simulate DO.W = 0003h ($D0 = 0xFF0003). 

3. Single step until S_EXIT is reached (s or si). Note that Step goes from the 
current value of PC, here initialized to 400b when the object file was loaded. 
Thus to start again, do Spc = SUM_OF_N or Spc = 0x400. 

The second example is more elaborate. Here the target is the 6809 equivalent 
of Tables fZOl and I2TH1 We wish to trace the execution down to where the simu¬ 
lated processor attempts to fetch the final instruction (RTS) at S_EXIT. Thus we 
have to set up a breakpoint at this address. 

This time the log shown in Table [Tin2l was generated as follows: 

1. Set a breakpoint at S_EXIT (b S_EXIT or b OxEOOC). Note, br S_EXIT sets a 
break when reading from S_EXIT, which is an alternative in this situation. 
Breaks on a Write and over a range of addresses are possible. For example 
bw OxEOOO, OxFFFF breaks when a Write is attempted in memory between 
E000b and FFFFb, which is one way of simulating ROM. An unlimited number 
of breakpoints can be set. 
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Table 15.1 Simulating the program of Table \4.10\ User input shown in quotes, comments bracketed. 
"e SUM_OF_N #5" 

(Disassemble from SUM_OF_N for five instructions) 


0x000400 

SUM_OF_N: 


0x000400 02800000ffff 

andi.1 

#0xffff,d0 

0x000406 4281 

cl r.l 

dl 

0x000408 

SLOOP: 


0x000408 d280 

add.l 

dO,dl 

0x00040a 51c8fffc 

dbf 

dO,SLOOP 

0x00040e 

S_EXIT: 


0x00040e 4e75 

rts 



"$d0 = 0xff0003" 

(Set DO.L to 00FF0Q03h) 


"s" 

(Single step for as long as desired) 


dO:00ff0003 
d4:00000000 
aO:00000000 
a4:00000000 

dO:00000003 
d4:00000000 
aO:00000000 
a4:00000000 

dO:00000003 
d4:00000000 
aO:00000000 
a4:00000000 


dO:00000003 
d4:00000000 
aO:00000000 
a4:00000000 

dO:00000002 
d4:00000000 
aO:00000000 
a4:00000000 

dO:00000002 
d4:00000000 
aO:00000000 
a4:00000000 


dO:00000001 
d4:00000000 
aO:00000000 
a4:00000000 


dO:00000001 
d4:00000000 
aO:00000000 
a4:00000000 


dO:00000000 
d4:00000000 
aO:00000000 
a4:00000000 


dO:00000000 
d4:00000000 
aO:00000000 
a4:00000000 


dO:OOOOffff 
d4:00000000 
aO:00000000 
a4:00000000 


dl:00000000 
d5:00000000 
al:00000000 
a5:00000000 

dl:00000000 
d5:00000000 
al:00000000 
a5:00000000 

dl:00000000 
d5:00000000 
al:00000000 
a5:00000000 

dl:00000003 
d5:00000000 
al:00000000 
35:00000000 


dl:00000003 
d5:00000000 
al:00000000 
a5:00000000 

dl:00000005 
d5:00000000 
al:00000000 
35:00000000 


dl:00000005 
d5:00000000 
al:00000000 
35:00000000 


dl:00000006 
d5:00000000 
al:00000000 
35:00000000 


dl:00000006 
d5:00000000 
al:00000000 
a5:00000000 


dl:00000006 
d5:00000000 
al:00000000 
a5:00000000 


dl:00000006 
d5:00000000 
al:00000000 
a5:00000000 


d2:00000000 
d6:00000000 
a2:00000000 
a6:00000000 

d2:00000000 
d6:00000000 
a2:00000000 
a6:00000000 

d2:00000000 
d6:00000000 
a2:00000000 
a6:00000000 

d2:00000000 
d6:00000000 
a2:00000000 
a6:00000000 


d2:00000000 
d6:00000000 
a2:00000000 
a6:00000000 

d2:00000000 
d6:00000000 
a2:00000000 
a6:00000000 


d2:00000000 
d6:00000000 
a2:00000000 
a6:00000000 


d2:00000000 
d6:00000000 
a2:00000000 
a6:00000000 


d2:00000000 
d6:00000000 
a2:00000000 
a6:00000000 


d2:00000000 
d6:00000000 
a2:00000000 
a6:00000000 


d2:00000000 
d6:00000000 
a2:00000000 
a6:00000000 


d3:00000000 
d7:00000000 
a3:00000000 
a7:00000000 

d3:00000000 
d7:00000000 
a3:00000000 
a7:00000000 

d3:00000000 
d7:00000000 
a3:00000000 
a7:00000000 

d3:00000000 
d7:00000000 
a3:00000000 
a7:00000000 


d3:00000000 
d7:00000000 
a3:00000000 
a7:00000000 

d3:00000000 
d7:00000000 
a3:00000000 
a7:00000000 


d3:00000000 
d7:00000000 
a3:00000000 
a7:00000000 


d3:00000000 
d7:00000000 
a3:00000000 
a7:00000000 


d3:00000000 
d7:00000000 
a3:00000000 
a7:00000000 


d3:00000000 
d7:00000000 
a3:00000000 
a7:00000000 


d3:00000000 
d7:00000000 
a3:00000000 
a7:00000000 


TS I XNZVC 

ssp:00000000 sr:.,/0/. 

usp:00000000 pc:00000400 

andi.l #0xffff,d0 

ssp:00000000 sr:../0/. 

usp:00000000 pc:00000406 

clr.l dl 

ssp:00000000 sr:../0/..Z.. 
usp:00000000 pc:00000408 

add.l dO,dl 

ssp:00000000 sr:../0/. 

usp:00000000 pc:0000040a 

dbf dO,SLOOP 

ssp:00000000 sr:../0/. 

usp:00000000 pc:00000408 

add.l dO,dl 

ssp:00000000 sr:../0/. 

usp:00000000 pc:0000040a 

dbf dO,SLOOP 

ssp:00000000 sr:../0/. 

usp:00000000 pc:00000408 

add.l dO,dl 

ssp:00000000 sr:../O/. 

usp:00000000 pc:0000040a 

dbf dO,SLOOP 

ssp:00000000 sr:../O/. 

usp:00000000 pc:00000408 

add.l dO,dl 

ssp:00000000 sr:../O/. 

usp:00000000 pc:0000040a 

dbf dO,SLOOP 

ssp:00000000 sr:../O/. 

usp:00000000 pc:0000040e 

(rts ) 


(Before 

ANDI.L #FFFFh) 


(After execution. 
Next instruction 
is CLR.L Dl) 


(Status register 
showing the 
Z flag setting) 


| (PC goes back to 
| 408h, i.e. SLOOP 
| as DO.W isn't 
I -1) 


(DO.W has been 
decremented to 
-1, so end of 
DBF loop) 

(Ans is in Dl.L 
at end of 
subroutine. RTS 
isn't executed) 
















SIMULATION 387 


2. Change Accumulator_B to 03 h to simulate the passing of n (Sb = 3). 

3. Trace to breakpoint from SUM_0F_N (t SUM_OF_N). 

Breakpoints can be a great deal more sophisticated than shown here. For 
example, every time data is stored at, say, AOOO/i (simulating an output port) the 
time since the last store and a register dump may be output to the display and 
the program continued. The syntax here would be: 

bw OxAOOO ? {"time is %d\n" .time - last_time; last_time = .time; g} 

which reads: break at a write to A000 h/ then print out "ti me i s zz", where zz 
is the value of the reserved label . ti me less the value of 1 ast_ti me/ then make 
1 ast_ti me equal to the present time/ go on. Adding m OxAOOO would also print 
out the data sent to the port! The label .time is predefined by the simulator to 
give the number of cycles taken since last set up. 

Assembly-level simulators can also be used to debug C programs. As our 
example, we will simulate Table I7.141 d). as shown in Table 115.31 This time the 


Table 15.2 Tracing the program of Table \Z.iA 

"b S_EXIT" 

(Set a breakpoint at S_EXIT) 

"$b = 03" 

(Make Accumulator B 03) 

"t SUM_0F_N" 

(Trace from sum_of_N down to breakpoint) 


EFHINZVC Register states before execution of this instruction 

CCR - 

cc:. dp:00 a:00 b:03 x:0000 y:0000 u:0000 s:0000 pc:e000 ldx #0x0000 

cc:.Z.. dp:00 a:00 b:03 x:0000 y:0000 u:0000 s:0000 pc:e003 tstb 

cc:. dp:00 a:00 b:03 x:0000 y:0000 u:0000 s:0000 pc:e004 beq SEND 

cc:. dp:00 a:00 b:03 x:0000 y:0000 u:0000 s:0000 pc:e006 abx 

cc:. dp:00 a:00 b:03 x:0003 y:0000 u:0000 s:0000 pc:e007 decb 

cc:. dp:00 a:00 b:02 x:0003 y:0000 u:0000 s:0000 pc:e008 bra SLOOP 

cc:. dp:00 a:00 b:02 x:0003 y:0000 u:0000 s:0000 pc:e003 tstb 

cc:. dp:00 a:00 b:02 x:0003 y:0000 u:0000 s:0000 pc:e004 beq SEND 

cc:. dp:00 a:00 b:02 x:0003 y:0000 u:0000 s:0000 pc:e006 abx 

cc:. dp:00 a:00 b:02 x:0005 y:0000 u:0000 s:0000 pc:e007 decb 

cc:. dp:00 a:00 b:01 x:0005 y:0000 u:0000 s:0000 pc:e008 bra SLOOP 

cc:. dp:00 a:00 b:01 x:0005 y:0000 u:0000 s:0000 pc:e003 tstb 

cc:. dp:00 a:00 b:01 x:0005 y:0000 u:0000 s:0000 pc:e004 beq SEND 

cc:. dp:00 a:00 b:01 x:0005 y:0000 u:0000 s:0000 pc:e006 abx 

cc:. dp:00 a:00 b:01 x:0006 y:0000 u:0000 s:0000 pc:e007 decb 

cc:.Z.. dp:00 a:00 b:00 x:0006 y:0000 u:0000 s:0000 pc:e008 bra SLOOP 

cc:.Z.. dp:00 a:00 b:00 x:0006 y:0000 u:0000 s:0000 pc:e003 tstb 

cc:.Z.. dp:00 a:00 b:00 x:0006 y:0000 u:0000 s:0000 pc:e004 beq SEND 

cc:.Z.. dp:00 a:00 b:00 x:0006 y:0000 u:0000 s:0000 pc:e00a tfr x,d 

cc:.Z.. dp:00 a:00 b:06 x:0006 y:0000 u:0000 s:0000 pc:e00c (rts ) 


breakpoint (1) OxeOOc 
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parameter n is in absolute memory (for demonstration purposes, it is not passed 
to the function), but otherwise the process is similar: 

1. Set a breakpoint at_prog_top-l (or 0xE021) and cause it to print out the 

value of . ti me: 

br_prog_top-l ? "Execution time = %d\n" .time 

2. Initialize n to OBh, that is set memory byte at L3_n to 03h (mb L3_n 03). 

3. Trace to breakpoint from _sum_of_n (t _sum_of_n). 

Cross-simulators are little better than native debug packages in their relation¬ 
ship with target hardware. Memory-mapped port registers/buffers can be simu¬ 
lated as memory locations. On a break, their value can be changed from the key¬ 
board and execution recommenced. Interrupts can similarly be simulated from 
the keyboard. In MIMIC the predefined symbol . i rq has a bit correspondence to 
the various interrupt lines. Thus for the 68000 MPU, making . i rq = 01 000000 b 
(e.g. typing .irq = 0x40) during a break pause makes the simulator respond to a 
level-7 interrupt when processing recommences. As an example: 

b ? .time == 20000 {.time=0; .irq=0x40; g} 

generates a level-7 interrupt each 20,000 cycles (100 interrupts per simulated 
second with an 8 MHz clock) and continues on. 

Simulating high-level sourced programs of any size at this level is tedious at 
the very least. Table [7H4l was deliberately chosen to have only static variables, 
so that each variable has a meaningful label attached. Most variables in C are 
dynamic (i.e. auto), and have no fixed abode. Of course in a simulator, their posi¬ 
tion relative to the Frame Pointer can be found and therefore accessed. However, 
determining by hand where many variables are in the frame is time consum¬ 
ing. Some compilers produce a report on the size and location of each variable. 
An example of such a report is given in Table 115.41 Not only is this useful for 
assembly-level simulation, but it is the first step towards high-level cross simu¬ 
lation. 

Simulators used to debug realistic high-level sourced software must provide 
the facility to monitor directly high-level objects and instructions as well as ad¬ 
dresses, registers and assembly-level instructions. The next few examples are 
based on the Intermetrics/COSMIC CXDB (C cross DeBugger) products. These are 
high-level front ends to the MIMIC simulator (renamed MICSIM for Microproces¬ 
sor SIMulator) we have just used. The user can move down to MICSIM at any time 
to perform any machine-level task, for example to set up a breakpoint on an at¬ 
tempt to write to simulated ROM, and move backup again. MICSIM's instructions 
are the same as those illustrated in Tables [T5TT1 - | 15.31 

At the high level, the following core features are available: 

Listing 

Displays the C source with or without the resulting assembly code, around the 
current execution point in the source window. 
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Table 15.3 Tracing a C function. 

"br _prog_top-l ? {"Execution time = %d\n" .time}" 

(Set breakpoint and printout time) 

".time = 0" 

(Set time symbol to zero) 

"mw L3_n" 

(Set word in memory = int n to 0003) 

0x0001: 0x0000 = "03" 

0x0003: 0x0000 = 

" 1 " 

(List all labels (not predefined)) 


0x0001: L3_n 
0x0003: L31_sum 

0x0005: _stack_bottom 

0x0400: _stack_top 

OxeOOO: _sum_of_n 
0xe022: _prog_top 

"t _sum_of_n" 

cc:.C dp:00 a:00 b:06 x:0000 y:0000 u:0000 s:2000 pc:e000 clra 

cc:.Z.. dp:00 a:00 b:06 x:0000 y:0000 u:0000 s:2000 pc:e001 clrb 

cc:.Z.. dp:00 a:00 b:00 x:0000 y:0000 u:0000 s:2000 pc:e002 std L31_sum 

cc:.Z.. dp:00 a:00 b:00 x:0000 y:0000 u:0000 s:2000 pc:e005 ldx L3_n 

cc:. dp:00 a:00 b:00 x:0003 y:0000 u:0000 s:2000 pc:e008 beq OxeOle 

cc:. dp:00 a:00 b:00 x:0003 y:0000 u:0000 s:2000 pc:e00a ldd L3_n 

cc:. dp:00 a:00 b:03 x:0003 y:0000 u:0000 s:2000 pc:e00d addd L31_sum 

cc:. dp:00 a:00 b:03 x:0003 y:0000 u:0000 s:2000 pc:e010 std L31_sum 

cc:. dp:00 a:00 b:03 x:0003 y:0000 u:0000 s:2000 pc:e013 ldd #0xffff 

cc:....N... dp:00 a:ff b:ff x:0003 y:0000 u:0000 s:2000 pc:e016 addd L3_n 

cc:.C dp:00 a:00 b:02 x:0003 y:0000 u:0000 s:2000 pc:e019 std L3_n 

cc:.C dp:00 a:00 b:02 x:0003 y:0000 u:0000 s:2000 pc:e01c bra 0xe005 

cc:.C dp:00 a:00 b:02 x:0003 y:0000 u:0000 s:2000 pc:e005 ldx L3_n 

cc:.C dp:00 a:00 b:02 x:0002 y:0000 u:0000 s:2000 pc:e008 beq OxeOle 

cc:.C dp:00 a:00 b:02 x:0002 y:0000 u:0000 s:2000 pc:e00a ldd L3_n 

cc:.C dp:00 a:00 b:02 x:0002 y:0000 u:0000 s:2000 pc:e00d addd L31_sum 

cc:. dp:00 a:00 b:05 x:0002 y:0000 u:0000 s:2000 pc:e010 std L3T_sum 

cc:. dp:00 a:00 b:05 x:0002 y:0000 u:0000 s:2000 pc:e013 ldd #0xffff 

cc:....N... dp:00 a:ff b:ff x:0002 y:0000 u:0000 s:2000 pc:e016 addd L3_n 

cc:.C dp:00 a:00 b:01 x:0002 y:0000 u:0000 s:2000 pc:e019 std L3_n 

cc:.C dp:00 a:00 b:01 x:0002 y:0000 u:0000 s:2000 pc:e01c bra 0xe005 

cc:.C dp:00 a:00 b:01 x:0002 y:0000 u:0000 s:2000 pc:e005 ldx L3_n 

cc:.C dp:00 a:00 b:01 x:0001 y:0000 u:0000 s:2000 pc:e008 beq OxeOle 

cc:.C dp:00 a:00 b:01 x:0001 y:0000 u:0000 s:2000 pc:e00a ldd L3_n 

cc:.C dp:00 a:00 b:01 x:0001 y:0000 u:0000 s:2000 pc:e00d addd L31_sum 

cc:. dp:00 a:00 b:06 x:0001 y:0000 u:0000 s:2000 pc:e010 std L31_sum 

cc:. dp:00 a:00 b:06 x:0001 y:0000 u:0000 s:2000 pc:e013 ldd #0xffff 

cc:....N... dp:00 a:ff b:ff x:0001 y:0000 u:0000 s:2000 pc:e016 addd L3_n 

cc:.Z.C dp:00 a:00 b:00 x:0001 y:0000 u:0000 s:2000 pc:e019 std L3_n 

cc:.Z.C dp:00 a:00 b:00 x:0001 y:0000 u:0000 s:2000 pc:e01c bra 0xe005 

cc:.Z.C dp:00 a:00 b:00 x:0001 y:0000 u:0000 s:2000 pc:e005 ldx L3_n 

cc:.Z.C dp:00 a:00 b:00 x:0000 y:0000 u:0000 s:2000 pc:e008 beq OxeOle 

cc:.Z.C dp:00 a:00 b:00 x:0000 y:0000 u:0000 s:2000 pc:e01e ldd L3T_sum 

cc:.C dp:00 a:00 b:06 x:0000 y:0000 u:0000 s:2000 pc:e021 (rts ) 


Execution time = 617 
*****breakpoint(l) 0xe021 





































390 C FOR THE MICROPROCESSOR ENGINEER 


Table 15.4 A report on the variables used in the 68008 TCM system of Table fr5.5\ 
Information extracted from a:diag_682.xeq 


SOURCE FILE : a:diag_68k.c 
FILE VARIABLES : 


extern 

unsigned 

char Array[256] 

at 

0xe002 

extern 

unsigned 

char 

Oldest 

at 

OxeOOO 

extern 

unsigned 

char 

*x 

at 

0x418 

extern 

unsigned 

char 

*y 

at 

0x41c 

extern 

unsigned 

char 

*z 

at 

0x420 

extern 

unsigned 

char 

*a_d 

at 

0x424 

extern 

unsigned 

char 

*diag_port 

at 

0x428 


FUNCTION : extern int mainO lines 11 to 40 at 0x42c-0x4d4 
VARIABLES: 

register unsigned char *array_ptr at reg. a5 
register unsigned char i at reg. d5 
register unsigned char leftmost at reg. d4 

FUNCTION : extern void updated lines 50 to 53 at 0x4d4-0x4f8 


<- Added comments 

<- These are the globals 
<- known throughout file 


<- Function mainO 

<- All its local variables 


<- Function update!) has 
<- no local variables 


FUNCTION : extern void diagnosticO lines 61 to 74 at 0x4f8-0x548 <- Nor has diagnostic!) 


FUNCTION : extern void output_test() lines 77 to 86 at 0x548-0x572 <- 
VARIABLES: <- 

register unsigned char count at reg. d5 


Ftn output_test has 
only 1 reg. variable 


FUNCTION : extern void input_test() lines 88 to 92 at 0x572-0x590 


<- Ftn input_test() has 
<- no local variables 


FUNCTION : extern void RAM_test() lines 94 to 111 at 0x590-0x5d0 <- Function RAM_test() 

VARIABLES: <- All its local variables 

register unsigned int i at reg. d5 
register unsigned char temp at reg. d4 
register unsigned char *memory at reg. a5 


FUNCTION : extern void ROM_test() lines 113 to 120 at 0x5d0-0x606 
VARIABLES: 

register unsigned char *address at reg. a5 
register unsigned short sum at reg. d5 


<- Function ROM_test() 

<- All its local variables 


000004dc 

_update 

00000554 

L133 

0000053c 

0000042c 

mai n 

00000514 

L142 

00000598 

00000000 

STARTUP 

000004ce 

L151 

0000049c 

00000408 

NMI 

000005d6 

L114 

0000e002 

00000406 

exi t 

000004d2 

L141 


00000400 

START 

000005a6 

L104 


00000444 

LI 

00000500 

L122 


00000468 

L17 

000004C8 

L131 


00000460 

L15 

00000420 

_z 


00000550 

_output_test 

00000428 

_diag_port 


0000057a 

input test 

0000041c 

_y 


0000044c 

Lll 

00000418 

_x 


000005dc 

ROM test 

OOOOeOOO 

_01dest 


0000060c 

L165 

00000500 

_diagnostic 


000005fe 

L135 

00000424 

_a_d 


000005d0 

L144 

00000612 

_prog_top 


00000528 

L162 

OOOOeOOO 

_data_top 


000005ec 

L125 

0000el02 

_stack_bottom 




00010000 

_stack_top 



L103 

_RAM_test 

L101 

_Array 


Monitor 

Displays state of C objects or values of C expressions continuously in the 
monitor window. 

Update 

Allows a C data object to be altered at any time. 
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Step execution 

Steps through the C program, each step executing one or more C source lines. 
During this time, variables may be continually monitored, and the function 
window shows which function the program is in and what values were passed 
to it. Also the state of the frame and any variables in that function can be 
examined. 

Breakpoints 

Insertion of high-level conditions, such as executing a C line or entering a func¬ 
tion, which causes execution to pause. Actions may be taken automatically on 
pause, such as changing the value of a variable. 

Execute 

Runs simulation at full speed from halt point, normally to the next breakpoint. 

For our first example, consider the screen dump of Fig. ll5.ll al. This shows the 
sum-of-integers program of Table f7T4l in the central code window. The cursor is 
at line 6, which has yet to be executed. The variables n and sum appear in the top 
left monitor window. To get to this point, the following commands were entered: 

1. Step from line 1 (entry point) to line 5 (s5 or five single steps, s). 

2. Command that the variables n and sum be monitored (m n,sum). 

3. Set (Update) the value of n to 20 to simulate a passed parameter (u n 20). 

4. Single step around the loop once, that is four steps (s4). After doing this, n 
has been added to sum, thus sum is 20. Also n has been decremented, thus n 
is 19. 

With the loop operation checked, the algorithm can be verified by either single 
stepping until line 12, or more conveniently, setting a breakpoint and executing 
at full speed. The screen dump of Fig. 115.11 b) is the result, with the following 
additional commands: 

1. Set a breakpoint at line 11 (b : 11) 

2. Go from current cursor (g :1 1 or just g) 

3. Look at register state at breakpoint (r) 

Now n has been decremented to zero and sum is correctly read as 210. Also 
shown in the code window is the state of the registers. If we could have gone back 
to the calling function, Accumulator_B would have been D2 h (i.e. sum returned). 
The register display can be toggled on and off by entering r. 

Figure 11 5 .li b) showed that the underlying target was 6809 code. If the regis¬ 
ter display had not been toggled, this fact would not have been known (actually 
Fig. ll5.ll a) was generated using the 68000 simulator, just to illustrate this point!). 
Mac hi ne independence is a feature of high-level symbolic simulators. It is pos¬ 
sible to step showing the mix ture of high and low-level codes, but stepping and 
monitoring are still at source level. 

The top right window shows both the current function and the function path 
taken to arrive at the current point. This is not very informative in the simple 
situation simulated in Fig. 115.11 but is useful in realistic situations. Figure 115.21 
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Insert screen dump Fig. 15.1 here please. 


(a) Checking the loop operation. 


(b) Going to the termination. 


Figure 1S.1 Tracing function sum_of_n(). 
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Insert screen dump Fig. 15.2 here. 


Figure 15.2 Illustrating the function path in reaching line 27. 

depicts the exponentiation program of Table 19.11 Three functions are coded, 
namely mai n(), power () and abs(). I have stepped through this program until 
exp is 1. The function window then shows abs (1562 5) at the bottom, which says 
you are now in function abs() which has had 15,625 passed to it (i.e. result). 
Above this is power (25,5), which says the entry point was from power () , to 
which had been passed the values 25 and 3. Finally above this is main(), the 
caller of power (). The 68000 simulator was used for this diagram. 

Finally let us exa min e our time-compressed memory software of Table UdTl In 
Fig. 115.3( a) I have stepped around the loop so that i is ten. The monitor window 
shows the contents of x (the X D/A converter), which is just i, the contents of 
y (the Y D/A converter), the variable Oldest (changed by update(), here = 0) 
and the array value currently being sent out to y (here Array [10]). This shows 
that expressions and indirection can be monitored, as well as simple variables. 
If pointers are monitored, they display in hexadecimal. Ordinary objects default 
to decimal, but using the mx command (examine Memory in hexadecimal) forces 
hexadecimal. 

In Fig. ll5.31 b) I have converted the code window to a view box of the variables 
in function mainQ. The top five (i, leftmost, *x, *y and *z) are all auto vari- 
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Insert screen dumps Fig. 15.3 here please. 


(a) Stepping around the loop until i is 1 0. 


(b) Viewing the variables. 


Figure 15.3 Simulating the time-compressed memory' software. 
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ables and their address is given in the System stack (set to 400/i by the startup 
code in the 6809 version). As *x, *y and *z are pointers, their value is given in 
hexadecimal. 

A r r ay [ ] is an external variable and is listed under file variables. All 256 values 
are given. If the window is scrolled down, the file variable 01 dest is given as: 

at 0x1 extern unsigned char Oldest = 0 

The screen dumps shown in Fig. |15.3[ a) and (b) were taken on different runs 
and thus *y and Array [10] vary between the two situations. 


Insert screen dump Fig. 15.4 here. 


Figure 15.4 Simulating an interrupt entry into update (). 

Both Ar r ay [] and Oldest objects are updated by an interrupt service routine. 
How is such an interrupt simulated with CXDB? To ' generate' an interrupt at any 
time (typically in-between steps or at a break point) requires a move down to the 
underlying assembly-level simulator (MICSIM with this product). The sequence 
of commands to generate the screen dump shown in Fig. ll S.TI was: 


1 / 

2 . irq = 1 


Move to MICSIM (assembly level) 
Making this variable = 1 simulates NMI 
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3 

4 

5 

6 

7 

8 


/ 


Return to the high level 
Step 

Monitor these variables 
Step 

Simulate the A/D converter's output 
Step 


s 


mx Oldest, Array[Oldest-l ] 


s 


u &a_d 0x5 5 


s 


The monitor screen shows the array element has taken on the simulated value 
5 5h and Oldest has been incremented to 5. Another step and the simulator 
returns to mai n(). 

In moving between levels, care must be taken. For example the compiler linker 
places the NMI vector at E7FC:Dh, assuming a 2716 EPROM (see Table fT4~4) . How¬ 
ever, the simulator has memory all the way up to FFFF/i, and thus goes to FFFC:D/i 
on a simulated NMI interrupt. Hence a manual setting up of the vectors is needed 
(mW OxFFFC NMI). Also some high-level simulators do not execute assembly-level 
startup routines, and these may require execution at the low-level simulation 
before beginning the high-level process. 


Insert screen dump Fig. 15.5 here. 


Figure 15.5 Mixed-mode simulation using XRAY68K. 

Our closing example shows a slightly more sophisticated high-level simulator, 
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based around the 68000-family Microtec compilers. This XRAY68K simulator is 
a true window-based package in that an on-screen cursor can be moved to any 
window and used to scroll up or down El. Also windows may be altered in size 
and even removed if desired. Both high- and low-level simulation is provided 
without having to move between packages. Figure 115.51 shows a simulation of 
the array-clearing routine from the time-compressed memory main routine. The 
simulator is set to its low-level mode. Here simulation is done at assembly-level, 
each step being one machine instruction. Nevertheless, the Data window shows 
the C-level variables being monitored. The state of the System stack is shown 
in window 14, which can be entered and scrolled up or down as desired. The 
startup routine sets A7 (i.e. the SSP) to 10000 h before going to main(). All 
registers and flags are shown in window 13. The pseudo register pi gives the 
Previous Instruction address. This particular software is set up to operate at a 
level-2 interrupt, which explains the setting of the three interrupt mask bits to 
001 b. The simulator can be changed between high- and low-levels by using the 
mode command or toggling the F3 function key (MSDOS version). 

One interesting feature of XRAY68K, is the ability to simulate input and output 
ports. In the former case the command inport <address> stops when <address> 
is read and accepts data from the keyboard before continuing; for example in¬ 
port 6000 would simulate the A/D converter. The command outport sends data, 
for example to a printer, when a specified memory location is written to. 


15.2 Resident Diagnostics 

Unless a commercial system is going to be used as the target, a major difficulty 
lies in the verification of the hardware integrity. Although the application soft¬ 
ware may have been checked using simulation, its testing in its intended envi¬ 
ronment cannot easily be carried out if the latter's operation is uncertified. Even 
with bought-in hardware, malfunction may occur in service. In such a situation, 
is it a hardware or software fault? 

In practice little more than d.c. and continuity tests can be carried out on the 
hardware as it stands. In order to isolate the testing of the hardware and the appli¬ 
cation software, it is necessary to introduce a package of programs specifically 
designed to exercise the various components. Such diagnostic software could 
be transported to the hardware enviro nm ent by using an in-circuit emulator, as 
discussed in the next section. Alternatively, the EPROM(s) may be removed and 
replaced by a diagnostic EPROM-based package. Where sufficient capacity exists, 
this package may co-reside with the application package, and this will be partic¬ 
ularly convenient for field servicing, especially if accessible by the customer. 

Following along the same path, hardware should be built at the outset for 
testability. In a microprocessor-based system, this usually means designing in a 
free-run facility. A free-running microprocessor has a null instruction jammed 
on to its data bus, and it spends all its time running this phantom software. 
During this endless execution, the address bus is incrementing, fetching down 
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the next null instruction. Thus all address logic, Chip Enable connections and 
some control signals, such as Reset and Clocks, can be monitored dynamically. 
Simple test equipment, such as a logic probe or oscilloscope are adequate for this 
purpose. Free running is especially useful for signature analysis |6]. 


6809 




(a) Normal running. 


(b) Free running. 




(c) Normal running. 


(d) Free running. 


Figure 15.6 Free-running your microprocessor. 


A null instruction has certain common characteristics, irrespective of the tar- 
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get MPU. Firstly it should not execute any Write cycles, as the data bus has been 
hijacked. This means that it will either do something on an internal register or 
even nothing at all. Secondly, its op-code should be the size of the data bus, or 
if larger, a repetitive multiple. 

Figure [T5TU1 a) and (b) shows the free-run facility applied to the 6809 MPU. Nor¬ 
mally the switch is open, and the two back-to-back diodes do not conduct. With 
the switch closed and the data bus isolated from the outside world, the pattern 
01011111b (5F h) is jammed onto the bus. Thus on Reset, the 6809 fetches down 
the two bytes at FFFE:F/t, 5F5F/t in t hi s case, and commences execution at this 
address. Its first instruction is 5F/t, or CLRB. Once this has been done (all Read cy¬ 
cles), the instruction at 5F60/t is fetched, again CLRB...ad infinitum. CLRB rather 
than NOP was chosen, as the latter's op-code of 01 h would require seven diodes. 

As long as the 6809 free runs, its address bus acts as a 16-bit counter, cycling 
from 0000 h to FFFF h. Assuming a 1MHz clock (4 MHz crystal), a 15 will cycle in 
2 16 x 2 = 0.131 s, a 14 in 0.0655 s, down to 2 ps for a 0 . During this time R/W and 
E and Q can be monitored using an oscilloscope. 

The address decoder outputs will last ^ of this cycle time, and will appear in 
the correct sequence. Some typical examples are shown in Fig. These can in 
turn be traced to the appropriate Chip Enables. Although the data bus is discon¬ 
nected from the MPU during this time, it will still be activated by any enabled in¬ 
put device. Thus using a 2-beam oscilloscope, monitoring the Switch_Port_Enable 
(i.e. 8000/t) and d 0 , will enable the state of Switch 0 to be seen, gated through to 
the data line. Similarly, activity at the time of the EPROM and RAM Chip Enables 
can be viewed. 



A15 

3UM (0000k) 

A/3) (WOOL) 

3)f&_&/fP (AOOOk) 


33-AL 0 A A A0 )>i0 /3)u>- 


Figure 15.7 One free-run cycle, showing RAM, A/D and DIG_0/P Enables. 
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The requirements for a 68000 null instruction are more stringent, as its op¬ 
code must be even. This is because of the requirements that both PC and SP 
must be even, and these will be equal to the null op-code after Reset. Should an 
odd word be fetched for these, then a fatal Double-Bus fault will occur 0. The 
word size of a 68000 op-code puts further restrictions on the choice of a null 
instruction for the 68008 processor, as this will fetch the op-code down in two 
identical bytes. Both considerations rule out the NOP instruction, with its op-code 
of 4E-71 h. Fortunately the op-code for 0RI. B #0, DO is 00-00 h, and this fulfils 
all these requirements. 

The free-run circuitry shown in Figs I15.6l c) and 1 1 5. Li d) comprises two head¬ 
ers. The normal header simply connects the eight data lines and DTACK directly 
through. Free running is accomplished by replacing this by a header shorting 
these lines to ground. 

As in the 6809 case, the PC and hence the address bus repetitively cycle 
through the entire address space. As the null instruction takes 16 cycles, then at 
8MFlz a 2ps instruction time is obtained. Address line a 19 takes 2 20 x 2 = 2.1 s 
to complete a sweep. During this time both AS and DS operate in the normal way, 
and R/W is high (Read). 

The address decoder outputs cycle with a repetition rate of 2 16 x 2 = 0.131s, 
as a! 5 is the highest decoded line. Waveforms are si mi lar to those for the 6809, 
shown in Fig. 115.71 but decoder outputs are qualified by AS, giving striated Chip 
Enables. As previously described, these can be used with a 2-beam oscilloscope 
to monitor activity on the data bus. The DTACK generator can also be monitored, 
although this is trivial in simple circuits such as shown in Fig. 113.31 Logic analyzer 
traces showing the 68000 in this free-run mode are shown in reference [8l . 

The free-run facility is useful, as it requires a minimum of built-in test hard¬ 
ware. It is possible to take the process a stage further, and incorporate a hard¬ 
ware single-step facility ©. However, this isn't often done, as the extra testability 
rarely justifies the expense. 

With a reasonable assurance that the target hardware is functioning, the diag¬ 
nostic software can be loaded in. This can be done by using a romulator (ROM 
emulator) or programming an EPROM and inserting into its socket. The former 
uses a block of dual-port RAM to take the place of ROM memory. One port is con¬ 
nected to the ROM socket in the target system via a ribbon cable and DIL plug. 
The other port is controlled from the terminal, typically the workstation on which 
the compiler/assembler runs. A driver software package downloads hex files to 
t hi s RAM, usually through a serial link. With the loading completed, the ROMula- 
tor can be switched to emulate mode, and will appear as a programmed ROM [9]. 
The use of an in-circuit emulator for this purpose is the subject of the following 
section. 

The circuits of Figs 113.ll and l 13.31 have a 4-bit switch port available to choose 
between normal and diagnostic modes. With this port at zero at power-up, the 
normal application program is run. In the diagnostic mode, one of four tests are 
made, as follows: 
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Switch 0: Check analog and digital output ports 

Switch 1: Check analog input port 

Switch 2: Check RAM chip 

Switch 3: Check ROM chip 

Once one of the two modes have been entered, change-over can only be imple¬ 
mented through a reset. After Reset the switch port can be used as a normal 
run-time port. 

The application software shown on the first page of Table H531 is basically a 
modified version of Table [T4.141 There are two significant changes. Firstly, all 
ports (now including the switch port, named di ag_port) are defined externally, 
that is before mai n(). This is because they are needed by the various diagnostic 
routines, besides mai n () and update (). Although it is considered poor program¬ 
ming practice to use public objects unnecessarily, hardware ports are by nature 
global. As can be seen from Table 115.61 such objects are stored as constants 
in ROM in the same manner as the static const equivalents of Table fl4.141 
They could still be qualified as stati c, in which case they would not be declared 
globally known to the linker. 

The second change checks the state of di ag_port (ANDed with 00001 111 b 
to zero the undefined upper four bits). If non-zero (True), then execution is 
transferred to function diagnostic(). Otherwise the time-compressed mem¬ 
ory endless loop is entered. The diagnostic software thus adds nothing to the 
execution time of the applications software. 

Function diagnostic() comprises a main body having an endless loop se¬ 
lecting one of four subfunctions, depending on which switch is set. Notice from 
Table HT6t b). lines C69 - C72, that the BTST instruction is used to check the state 
of the target switch, rather than use the less efficient AND or BIT instruction. 

The output_test() function simply counts up from 0 to 255 and sends each 
value to the Z digital and X analog output ports. The complement is sent to the 
Y analog port. Using an oscilloscope, the X and Y ports give ramps, up and down 
respectively, as shown in Fig. 115.81 The Z port acts as an 8-bit counter. 

The i nput_test () function ' connects' the analog input port to the two analog 
outputs. Thus using a sinewave generator as an input should give two quantized 
copies at the output. The switch input port is of course implicitly tested by getting 
to this routine in the first place. 

Testing the RAM chip is in essence a matter of sending out a test pattern 
(1 010101 0b in this case) to each cell in turn and checking that it gets there lUcfl . 
This is of course a destructive test, so the original value must be fetched and 
saved, before each cell is checked (line C102) and returned afterwards (line C109). 
The pointer variable address is used to move up through memory. The values 
RAM_START (a pointer) and RAM_LENGTH are defined for the circuit in the header 
file (Tables fl4. 21 and I14.91 . The digital Z port is used to indicate pass or fail by 
being set respectively to all ones or all zeros. Of course exercising with such a 
simplistic test pattern is not a fully comprehensive verification; for example, it 
will not detect a stuck at bit 0 error. However, the principle is the same for a 
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more sophisticated set of test patterns. 

Checking the ROM is something that is likely to be carried out as part of a 
field test. This is normally done by replacing an unused word by a 16-bit error 
checking code, which is such that the sum of all memory contents is zero. Most 
EPROM programmers will give the 16-bit sum of all word locations (i.e. 16 bits at 


Table fim Complete 68008 package, including resident diagnostics (continued next 
page). 

/* Version 01/02/90 */ 

#include <hard_68k.h> 

unsigned char Array [256]; /* Global array holding display data */ 

unsigned char Oldest; /* Index to the Oldest inserted data byte (left point on scr*/ 

unsigned char * const x = ANAL0G_X; /* x points to a byte @ (address) ANAL0G_X */ 

unsigned char * const y = ANAL0G_Y; /* y points to a byte @ (address) ANAL0G_Y */ 

unsigned char * const z = Z_BLANK; /* The z-mod port (digital port) */ 

volatile unsigned char * const a_d = ANINPUT; /* This is the Analog input port */ 

volatile unsigned char * const diag_port = SWITCH; /* The z-mod port (digital port) */ 

mai n () 

{ 

register unsigned char * array_ptr; /* Pointer into array */ 

register unsigned char i; /* Scan counter */ 

register unsigned char leftmost; /* The initial array index when x is 0 */ 

void diagnostic (void); /* Define the diagnostic function */ 

if(*diag_port&0x0f) /* Call the diagnostic function if switch port set to non-zero */ 
{diagnostic();} 

Oldest = 0; /* Start New index at beginning of the array */ 

for(array_ptr=Array; array_ptr<Array+256; *array_ptr++=0) {;} /* Clear array */ 

while(l) /* Do forever display contents of array */ 

{ 

leftmost = Oldest; /* Make leftmost point on screen the oldest sample */ 

for (array_ptr=Array, i=0; array_ptr<Array+256;) 

{ 

*x = i; /* Send x co-ordinate to X plates */ 

*y = *(array_ptr++ +leftmost)&0x0ff; /* and the display byte to the y D/A */ 

} 

*z = BLANK_0N; /* Blank out for flyback */ 

*x = 0; /* Move to right of screen */ 

*y = Array[01dest]; /* Y value at left of screen */ 

for(i=0; i<5; i++) {;} /* Delay */ 

*z = BLANK_0FF; /* Blank off */ 

} /* Do another scan */ 

} 

/***** * -.V * * ****************************** * -.V * * Vf * * -.V * * * ********* * * * * *********** * * * * ******** * Vf 

* This is the NMI ISR which puts the analog sample in the array & updates the New index* 

* ENTRY : Via NMI and startup * 

* ENTRY : Array[] and Oldest are global * 

* EXIT : Value held at a_d in Array[01dest], Oldest incremented with wraparound at 256* 

* * * * * * * * * * * * * * * * * it * * * * * * * * * * * * * * * * * * * * * * * * * Vf * * * * * * * * * * * * * * * •ic * * * * * iV * * iV * * * * * * * * * * * * * * * * * * * J 

void update(void) 

{ 

Array[01dest++] = *a_d;/* Overwrite oldest sample in Array[] & inc Oldest index mod-256 */ 

} 
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Table 15.5 (continued) Complete 68008 package, including resident diagnostics. 

'********* * * * * * * -.v * * -.v *************** * * * * * * * * * * ************** * * * * * * * * * * * * * * ************** * * 

* The diagnostic routine calling up 1 of 4 tests depending on the state of the switches* 

* ENTRY : When switches are non-zero * 

* EXIT : Endless loop, use Reset to exit, first setting all switches to zero * 

****** * * * * * * * * * * * * * *************** * * * * * * * * * * * * * * * * ********* * * * * * * * * * * * * * * * * * * * ********* * 


void diagnostic (void) 

void output_test(void); 
void input_test(void); 
void RAM_test(void); 
void R0M_test(void); 
while(l) 

{ 

if (*diag_port&0x01) 
else if(*diag_port&0x02) 
else if(*diag_port&0x04) 
else if(*diag_port&0x08) 

} } 


/* Declare each diagnostic sub-function 


/* Do forever the diagnostic tests 

{output_test();} 

{input_test();} 

{RAM_test();} 

{R0M_test();} 


V 


V 


void output_test(void) 

register unsigned char count = 0; 
do 

{ 

*x = count; /* Send count out to X D/A converter, i.e. ramp up 

*z = count; /* and to the Z digital port 

*y =~count; /* ramp Y output down 

} while(++count != 0); 

} 


V 

V 

V 


void input_test(void) 

*x = *a_d; 

*y = *a_d; 


/* Get input from a_d and send to X d/a */ 

/* and to Y d/a */ 


void RAM_test(void) 

{ 

register unsigned int i; 
register unsigned char temp; 

register unsigned char * address;/* Address of the memory byte being tested */ 

*z = 0; /* Set digital port to all zeros (pass) */ 

for(address=RAM_START; address<RAM_START+RAM_LENGTH;) 

{ 

temp = *address; /* Get ith memory byte */ 

*address = OxAA; /* Send out 10101010b to it */ 

if(*address != OxAA) 

{ /* IF not this value THEN signal failure by sending out 11111111b*/ 

*z = OxFF; 

break; 

} 

*address++ = temp; /* Restore original value */ 

} } 


void R0M_test(void) 

{ 

register unsigned short * address; /* Address points to 16_bit word in EPROM */ 

register unsigned short sum=0; 

*z = 0; /* Set digital output to all zeros to signal pass */ 

for(address=R0M_START; address<ROM_START+ROM_LENGTH; sum+=*address++) {;} 

if(sum) {*z = 0xFF;}/*IF a non-zero sum THEN signal error by digital output = 10101010*/ 


a time). Unprogrammed locations are usually FF/i, and so one must be added to 
this sum to compensate for the overwritten word (FFFF/i is -1). Thus we have 
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Figure 15.8 The output_test() traces. 


for this checksum (CS): 
CS + (sum + 1) = 0 


or 


CS = -sum - 1 = (sum' + 1) - 1 = sum' 

where sum' is the l’s complement and sum' + 1 is the 2’s complement, that is 
-sum. Hence all that is needed is to invert the modulo-65536 (16-bit) summation 
of all EPROM words and overwrite a convenient unprogrammed word. In cases 
where unprogrammed locations are zero, then one should be subtracted from 
the inverted summation. 

Function R0M_test 0 walks through the contents of the EPROM using a pointer- 
to short moving from R0M_START to R0M_START + R0M_LENGTH. Each word is 
added to the 16-bit variable sum, which eventually will give the modulo-65536 
check digit, which is hopefully zero. Care must be taken as the checksum is not 
always calculated in this way. For example the modulo-65536 sum of all bytes 
does not give the same answer. 

Code generated by the di agnosti cs () source is given in Table ll5.6l The time 
compressed code is virtually the same as in Table [T4. 14 1 and is not reproduced 
here. The assembly-level code is commented and is straightforward. One inter¬ 
esting point concerns lines C90 and C91. The reader might suppose the textbook 
equivalent: 

*x = *y = *a_d; 

to be the same. Not necessarily so. This compiler implemented this as follows: 
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1. Get value from a_d and send out to y. 

2. Get value from y and send out to x. 


Table lT5~6l Code for the 68008 implementation (continued next page). 


* 5 unsigned 


_x: 

* 6 unsigned 

-y: 

* 7 unsigned 


* 8 volatiie 
_a_d: 

* 9 volati1e 

_diag_port: 

* 10 

* 11 main() 


char * const x = ANAL0G_X; /* x points to a byte @ (address) ANAL0G_X 
. text 
. even 

.long 0x2000 

char * const y = ANAL0G_Y; /* y points to a byte @ (address) ANAL0G_Y 
. even 

.long 0x2001 

char * const z = Z_BLANK; /* The z-mod port (digital port) 

. even 

.long OxaOOO 

unsigned char * const a_d = ANINPUT; /* This is the Analog input port 
. even 

.long 0x4000 

unsigned char * const diag_port = SWITCH; /* The z-mod port (digital port) 
. even 

.long 0x8000 


*/ 

*/ 

V 

*/ 

*/ 


(a) Defining the ports as external constants. 


* 

61 

void diagnostic (void) 



* 

62 

{ 





. even 



* 

63 

void output_test(void); 

/* Declare each diagnostic sub-function 

*/ 

* 

64 

void input_test(void); 



* 

65 

void RAM_test(void); 



* 

66 

void R0M_test(void); 



* 

67 

while(l) 

/* Do forever the diagnostic tests 

*/ 

* 

68 

{ 



* 

69 

if(*diag_port&0x01) 

{output_test();} 



_diagnostic: 


L151: 

move.1 

_diag_port, 

al 

*## Point Al to switch port 


btst 

#0,(al) 


*## Test switch 0 


beq. s 

L171 


*## IF zero THEN try next switch 


jsr 

_output_test 

*## ELSE do output test 

* 70 

else if(*diag_ 

_port&0x02) 


{input_test();} 


bra. s 

L151 


*## and redo the switch scan on return 

L171: 

move. 1 

_diag_port, 

al 

*## Repeat the above for switch 1 


btst 

#1,(al) 




beq. s 

L112 




jsr 

_input_test 

*## which if set commands the input analog port test 

* 71 

else if(*diag_ 

_port&0x04) 


{RAM_test();} 


bra. s 

L151 



L112: 

move. 1 

_diag_port, 

al 



btst 

#2,(al) 


*## IF switch 2 is set THEN 


beq. s 

L132 




jsr 

_RAM_test 


*## go do the RAM test 

* 72 

else if(*diag_ 

_port&0x08) 


{R0M_test();} 


bra. s 

L151 



L152 : 

move.l 

_diag_port, 

al 



btst 

#3,(al) 


*## IF switch 3 is set THEN 


beq. s 

L151 




jsr 

_R0M_test 


*## go do the ROM test 


bra. s 

L151 




* 73 } 

* 74 } 

* 75 

* 76 
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Table lTiTSl Code for the 68008 implementation (continued next page). 


* 77 void output_test(void) 

* 78 { 

. even 

_output_test: move.! d5,-(sp) 


* 79 

register unsigned 

char count = 

: 0; 





ci r. b 

d5 

*## 

count lives in D5.B and is zeroed 


* 80 

do 






* 81 


{ 





* 82 


*x = count; 


/* Send count out to X D/A converter; ie ramp 

up V 

L162: 


move. 1 

_x,al 

*## 

Al points to X port 




move. b 

d5,(al) 

*## 

send count out to this port 


* 83 


*z = count; 



/* and to the Z digital port 

*/ 



move. b 

_z,al 

*## 

Al points to Z port 




move. b 

d5,(al) 

*## 

send count out to this port 


* 84 


*y =~count; 



/* ramp Y output down 

V 



move. 1 

_y, al 

*## 

Point to Y analog port 




ci r.w 

d7 






move. b 

d5,d7 

*## 

Get count again! 




not.w 

d7 

*## 

Invert it (ie -count) 




move. b 

d7,(al) 

*## 

and send it out 


* 85 


} while(++count != 0); 






addq.b 

#l,d5 

*## 

First increment count 




bne.s 

L162 

*## 

IF not folded over to zero (ie 256) THEN 

repeat 



move. 1 

(sp)+,d5 






rts 





* 86 

} 






* 87 







* 88 

voi d 

input_test(voi d) 




* 89 

{ 

. even 





* 90 

*x = 

*a_d; 



/* Get input from a_d and send to X d/a 

V 

_input_test: move.l 

_x,al 

*## 

Point Al to X d/a output port 




move. 1 

_a_d,a2 

*## 

Point A2 to a/d input port 




move. b 

(a2),(al) 

*## 

Send input data to output X port 


* 91 

*y = 

*a_d; 



/* and to Y d/a 

*/ 



move. 1 

_y,al 

*## 

Point Al to Y d/a output port 




move. 1 

_a_d,a2 

*## 

Point A2 to a/d input port 




move. b 

(a2),(al) 

*## 

Send input data to output Y 




rts 





* 92 

} 






* 93 







* 94 

voi d 

RAM_test(void) 




* 95 

{ 






_RAM_ 

.test: 

movem.1 

d5/d4/a5,- 

(sp) 



* 96 

register unsigned 

i nt i ; 




* 97 

register unsigned 

char temp; 




* 98 

register unsigned 

char * address 

/* Address of the memory word being tested *, 

* 99 

*z = 

0; 



/* Set digital port to all zeros (pass) 

* 



move. 1 

_z, al 

*## Al points to Z port 




cl r.b 

(al) 

*## Send out 00000000b 



While this maybe logically correct, y is a write-only port; it cannot be read! There 
is no way in ANSII C to designate an object write-only. A read-only object is of 
course designated as const. Designating such an object vol ati 1 e may help, in 
that it should signal to the compiler that it cannot depend on what it reads, but 
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Table 15.6 (continued) Code for the 68008 implementation. 
* 100 for(address=RAM_START; address<RAM_START+RAM_LENGTH;) 


/* Get ith memory byte 
et byte pointed to by address 
/* Send out 10101010b to it 



move.1 

#0xe000,a5 

*## / 

L113: 

move.1 

a5,d7 

*## / 


cmpi.1 

#0x10000,d7 

*## 1 


bcc.s 

L123 

*## : 

* 101 

{ 



* 102 

temp = *address; 



move.b 

Ca5),d4 

*## i 

* 103 

*address = 

OxAA; 



move. b 

#0xaa,(a5) 

*## i 

* 104 

if(^address 

!= OxAA) 



move. b 

Ca5),d7 

*## 


cmp.b 

#0xaa,d7 

*## 


beq. s 

L133 

*## 

* 105 

{ 

/* IF not this 

value 

* 106 

*z = OxFF; 



move.1 

_z,al 

*## 


move. b 

#0xff,(al) 

*## 

* 107 

break; 




bra. s 

L123 

*## 

* 108 

} 



* 109 

*address++ 

= temp; 


L133 : 

move. b 

d4,(a5)+ 

*## 

* 110 

} 




bra. s 

L113 

*## 

L123 : 

movem.1 

(sp)+,d5/d4/a5 


rts 




111 

112 

113 

114 


V 

3 v 


*## and exit the loop 

/* Restore original value 
F ok, restore old RAM byte & mov 

## and repeat test on next RAM byte 


*/ 


void R0M_test(void) 

{ 


. even 

_R0M_test: movem.l d5/a5,-(sp) 

* 115 register unsigned short * address; /* Address of EPROM word */ 

* 116 register unsigned short sum=0; 

clr.w d5 *## D5.W used for sum 

* 117 *z = 0; /* Set digital output to all zeros to signal pass */ 

move.l _z,al *## Send out 00000000b to Z port to signal ok 

clr.b (al) 

* 118 for(address=R0M_START; address<R0M_START+R0M_LENGTH; SUM+= *address++) {;} 

suba.l a5,a5 *## Funny clear of A5, R0M_START is 0 in this case 


L143: 

move.1 

a5,d7 

*## 

address moved to D7 for compare (why?) 


cmpi.1 

#0x2000,d7 

*## 

Gone over the top of ROM? 


bcc. s 

L153 

*## 

IF yes THEN exit for loop 

L163: 

add .w 

(a5)+,d5 

*## 

ELSE add word to sum 


bra. s 

L143 

*## 

and do again 

* 119 if (sum) 

{*z = OxFF;} /* IF a non 

-zero 

sum, signal error by digital o/p = 10101010b 

L114: 

tst.w 

d5 

*## 

Is sum zero? 


beq. s 

L104 

*## 

IF yes THEN no problem 


move.1 

_z,al 

*## 

ELSE send out 11111111b to Z port 


move. b 

#0xff,(al) 

*## 

to signal an error has occurred 

L104: 

movem.1 

(sp)+,d5/a5 




rts 




* 120 } 






.glob! 

_update, _R0M_ 

.test 

_RAM_test, _input_test, _output_test, _main 


.globl 

_diagnostic, 

_diag_ 

.port, _a_d, _z, _y, _x 


. even 




_01dest: 


= .+1 




.globl 

_01dest 




. even 




_Array: 


= .+256 




.globl 

_Array 




(b) Coding for diagnosticQ and supporting functions. 


this is a grey area of compiler design. 

Table [T531 declared that this was the software for the 68008 implementation. 
In fact it is applicable to the 6809 target except for RAM_test (). Testing RAM in C 
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is dangerous, as the stack and frame area is of course in this part of memory. Even 
though I have made RAM_test() non-destructive, problems can arise. By making 
temp a regi ster variable, the original value of any RAM location can temporarily 
be stored (in D4.B) out of harm's way. However, the 6809 compiler ignores any 
regi ster qualifications (as do implementations for most 8-bit targets) and puts 
temp as an auto variable in a frame, that is in RAM. When that particular address is 
tested, temp will be overwritten by the test pattern. Similarly the pointer address 
itself is in RAM. 


Table 

void RAM_test(void) 
{ 

.define 
ldy 


15.7 An alternative RAM testing module for the 6809 system. 


_asm(" 

_asm(" 

_asm(" 

_asm(" 
_asm("RL00P: 
_asm(" 

_asm(" 

_asm(" 

_asm(" 

_asm(" 

_asm(" 

_asm(" 

_asm(" 

_asm("ERR0R: 

_asm(" 


} 


RAM_START = 0, RAM_LENGTH = 800h\n"); 

#0 ; i held in Y, =0\n"); 

ldb #10101010b ; Test pattern in B\n"); 

clr _z ; Send out all zeros to digital port to signal ok\n"); 

Ida RAM_START,y ; Get mem byte @ RAM_START+i (RAM_START from header)\n"); 

stb RAM_START,y ; Put pattern back out to same location\n"); 

cmpb RAM_START,y ; Did it get there?\n"); 

bne ERROR ; IF not THEN break to error handler\n"); 

sta RAM_START,y ; ELSE put byte back\n"); 

leay l,y ; i++\n"); 

cmpy #RAM_LENGTH+1 ; Finished yet?\n"); 

bne RL00P ; IF not THEN test next byte\n"); 

rts\n") ; 

ldb #llllllllb ; Put out the error code\n"); 
stb _z\n"); 

; Exit via }'s RTS 


In practice this non-regi ster implementation may not cause problems; much 
depends on the code a compiler produces. Rather than run a risk, Table 115.71 
shows an alternative RAM_test() with an embedded assembly-level routine. This 
time the pointer equivalent to address is held in Index register_Y, and temp 
resides in Accumulator_A. 

A diagnostic package is intimately tangled up in the hardware, and, of course, 
should be above reproach. Because of this, special care needs to be taken if 
diagnostic software is written in C. Including some assembly-level code for the 
purpose of diagnosis may well be desirable, as the uncertainty of compiler allo¬ 
cation and action is eliminated. 


1 5.3 In-Circuit Emulation 

The simulation techniques explored in Section 15.1 offer a low-cost solution to 
software debugging. Similarly, the techniques covered in Section 15.2 give an 
inexpensive approach to verifying the target hardware. Testing the interaction of 
these two components is the subject of this section. 

The low-cost approach to this problem is to take the hex file produced by the 
compiler and program an EPROM. With this firmware in situ, the operation of the 
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system can be monitored using the normal hardware tools. A variation of this 
technique uses a ROMulator. This is a RAM pack with a flying lead and DIL plug 
masquerading as an EPROM. Machine code can be downloaded into the ROMulator 
which is plugged into the target EPROM socket. Such software is easier to change 
than firmware and some monitoring of target variables is possible. 

Where a more extensive examination of both hardware and software is neces¬ 
sary, then an in-circuit emulator (ICE) is required fTTI . An ICE is a microprocessor- 
based product which exercises the target hardware under the control of a micro¬ 
processor development system. A typical configuration is shown in Fig. 115.91 
Here the ICE replaces the target microprocessor via an umbilical cord and plug. 
The ICE hosts the same processor as the target, often piggybacked onto the umbil¬ 
ical plug, to be as close to the target as possible. This slave (i.e. target) processor 
is controlled by the ICE master microprocessor, which also communicates with a 
computer via a serial link. Thus, typically the target MPU might be a 68008, with 
a Z80 master and 8086-based computer! 

Many different configurations are possible. Historically the Intel corporation 
invented the ICE in 1975 as part of their development system for the 8080 MPU. 
The ICE-80 was a plug-in card to the Intel Microprocessor Development System 
(MDS) bus. An 8080 processor both emulated, controlled and communicated 
with the user. Most manufacturers followed with their own version, such as 
Motorola's EXORmacs MDS. Some of the large test equipment manufacturers, 
notably Tektronix and Hewlett Packard, developed a general purpose MDS, not 
tied to one specific manufacturer's product fl2l . Here the ICE could be altered 
by changing the plug-in board, pod and associated software. 

With the rise in popularity of the personal computer, the stand-alone configu¬ 
ration of Fig. ll5.9l has become popular. The user interface can be anything from a 
dumb VDU terminal to a workstation or minicomputer. Firmware in the ICE itself 
communicates with this ter mi nal and is used by the master processor. Gener¬ 
ally, changing the target processor involves changing one or more of; the pod, 
firmware, ICE-board, terminal software. 

Although most stand-alone ICEs will operate with a dumb terminal host, the 
internal ROM-based ICE commands are very basic and elementary. Using an in¬ 
telligent terminal, such as a personal computer, allows a much more powerful 
and user-friendly software interface to insulate the user from the complexities of 
the ICE hardware. Aids such as menus and helpful prompts are useful to novice 
users. As with other software aids, the protocols and commands available are 
very product dependent, doubly so here as both hardware and software are in¬ 
volved. The following examples use the Noral SD l[] product |TTI , but the facilities 
available are similar to most products |11| . 

All ICEs permit shadowing of the target's memory map. Thus memory is avail¬ 
able to the slave emulator MPU on-board the ICE. As seen by this slave, its memory 
map can be set in chunks between local internal memory (known as overlay) or 
the target. As an example, consider a target with ROM between 3000 and 3FFF/r, 

^oral Microelectronics, Logic House, Gate St., Blackburn, Lancs, BB1 3AQ, UK. 
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6000 and 7FFFh, and E000 and FFFFh. The rest of its memory space is occupied 
by RAM or memory-mapped peripherals. Normally on power-up all memory is 
mapped to the target of type read/write. To ' move' the three ROM areas into the 
internal overlay ICE memory, use the MMO (Memory Map to Overlay) commands 
thus: 


MMO 3000, 3FFF, P 
MMO 6000, 7FFF, P 
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MMO E000, FFFF, P 

where P stands for write Protected (an error will be printed if the software at¬ 
tempts to write to any of this overlay memory, that is simulating ROM). After this 
is done, the memory map is displayed as shown in Table [uTsl Sixteen blocks 
can be allocated in this manner, in minimum increments of 4 kbytes. Also shown 
in the listing is a memory test of the target RAM lying between C000 and CFFF/i 
(MT is Memory Test). This writes 1 01 01 01 Oh and 01 01 01 01 b into all locations 
as specified. More sophisticated tests are available. 

Using this overlay memory technique, resources may be gradually switched 
from the ICE to the prototype system. Thus a target A/D converter can be ini¬ 
tially mapped to an overlay RAM location and used to test the software. The real 
peripheral can then be exercised by switching from overlay to target. Some ICEs 
even provide an optional local clock, which may be used instead of the target 
clock. 

Facilities typically provided by an ICE are: 

File handling 

Downloading machine-code and symbol files into memory. 

Register and memory examine/change 

To examine and change any microprocessor register, overlay or target memory 
location. 

Step execution 

To execute the directed software in the target environment step by step, usu¬ 
ally displaying registers and other information after each step. 

Breakpoints 

Insertion of conditions, which may be software and/or external hardware sig¬ 
nals, to halt execution. 

Execute 

Full speed execution until a breakpoint is reached. 

Trace analysis 

This can be either a software or real-time trace. In the case of the latter the 
system runs to a breakpoint. At this point the contents of a display buffer can 
be read both before and after this event. The state of various external signals 
can be displayed as well as address, data and control bus signals. Unlike a 
software trace, this data is acquired in real time and only displayed when 
execution has terminated. 

Most of the facilities described above are the same as those listed for software 
simulation in Section 15.1. However, in this case the software is being run in 
its real hardware enviro nm ent using a real microprocessor, possibly in real time. 
Trace analysis is different, however, in that a ' snapshot' of bus cycles and bus 
activity can be obtained. This logic analysis feature is usually rather limited. 
Instead an external logic analyzer can be used, triggered by the ICE itself when a 
breakpoint is reached. 

To illustrate some of these points, consider the example shown in Fig. ll5.9l 
The source for the program is shown in the central window. The smaller window 
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Insert Listing 15.8 here 


Table 15.8 Memory Mapping and Testing. 
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above this allows us to monitor selected memory locations or blocks as we step 
through the program. The MONM (MONitor Memory) commands setting this up 


Insert listing labelled Table 15.9 here 


Table 15.9 A window into the hardware using an ICE. 
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is shown in the command area at the bottom of the page. The register window at 
the top right shows the state of the MPU's registers. The Supervisor Stack Pointer 
(SSP) has been set to 1 0000 h by using the RW (Register Write) command. Below 
this is the state of the System stack. This is useful to examine parameters passed 
to the function or subroutine through the stack and monitoring frame data. 

Clicking a mouse on the [S] or [CO] boxes causes the program to Step or GO 
and execute as appropriate. In the Step mode the line of code being executed is 
highlighted on the screen. 

Although the data presented in Fig. ll5.9l looks si mi lar to that of Table [15711 
remember that the latter is a pure simulation whilst the former is running on an 
actual 68008 microprocessor. 

High-level ICE driven packages are now becoming available, which have the 
same relationship as the low and high-level simulators discussed in Section 15.1. 
Some of these are extensions of existing simulation products which makes mov¬ 
ing between a simulation and emulation environment easier. 

Although an in-circuit emulator is versatile, it is expensive (typically $7000+), 
relatively bulky and fragile. They can also be cantankerous! Thus it makes sense 
to use a simulator at the outset to check out the purely software aspects of the 
project. If testability has been incorporated into the hardware, as described in 
Section 15.2, then the ICE can be left for the final phases of the testing and ' tough 
nut' servicing situations. 
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CHAPTER 1 6 


C'est la Fin 


Having designed and tested our project it only remains to wrap up by doing a 
comparative analysis of the various implementations and giving some sugges¬ 
tions on how the basic specification can be extended. 


16.1 Results 

One of the first questions asked is how will a C-coded program compare with its 
assembly-level equivalent? To try and answer this question I have coded both our 
systems at assembly level, so that we can contrast the two approaches. In defence 
of the expected outcome, it should be pointed out that small routines, especially 
those that intimately interact with hardware, are the forte of assembly-level code 
and the antithesis of high-level languages. Thus our results will be at the far 
end of the spectrum; however, this will at least give us a worst-case yardstick to 
balance the pros and cons of the two approaches. 

Our first demonstration is the 6809-based coding of Table [RTF] This is struc¬ 
tured after the C-level coding of Tables IT4731 and fl 4.41 Like the C program, the 
variables Oldest and Array[] are stored in absolute memory locations and so 
are globally known to both the routines MAIN and UPDATE. 

At the beginning of the scan (lines 18-29) the address of the leftmost Y co¬ 
ordinate, Array [Oldest], is calculated and placed in Index register_X. This cal¬ 
culation is done in lines 18 - 21 by expanding out the 8-bit 01 dest index, pointing 
X to Array[0] and using the instruction LEAX D,X to put the effective address 
01 dest + Array back in X. 

The main scan routine simply uses the Post-Increment Index address mode au¬ 
tomatically to advance this pointer once each time Ar ray [i ] is fetched, prepara¬ 
tory to sending it out to the Y plates. The stratagem of keeping the X count 
in Accumulator_A and fetching Array[i] down to Accumulator_B, means that 
both X and Y co-ordinates can be output together using a single Store Double 
instruction (line 35). 

In incrementing the Array [] pointer, a check is made to detect the situation 
when its value reaches Array [2 56] (Array + 256) and to reset back to the begin¬ 
ning Array [0]. This gives a pseudo circular structure. Thus if 01 dest were 40 h, 
then the leftmost value (X = 00/r) would be Array[40], and when X reached 
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Table 16.1 A 6809-based assembly-level coding. 

1 .processor m6809 

2 ******************* ****************** * * * * * * ****************** * * * * * * * * * 

3 ;* This is the background routine which spends its time sequentially * 

4 ;* going through the 256 array bytes, sending out the value to the * 

5 ;* Y plates whilst ramping up the X plates * 

C ****************** * ******************** * * ******************** * * * * * * * * * 


7 



.define 

ANALOG = 2000h, ANINPUT = 6000h, Z_BLANK = OAOOOh 

8 



.psect 

text 


9 

E000 

10CE0400 MAIN: Ids 

#400h 

Set up stack top 

10 

E004 

7F0000 

cl r 

Oldest 

Start new index at start of array 

11 



for (i=0; i<256; i++) Clear the array 

12 

E007 

8E0003 

ldx 

#Array 

Point to bottom of array 

13 

E00A 

6F80 

CLOOP: clr 

, x+ 

Clear Array[i] , i++ 

14 

E00C 

8C0103 

cmpx 

#Array+256 

Over the top yet? 

15 

E00F 

26F9 

bne 

CLOOP 

IF not THEN again 

16 



while(l) Forever display data to oscilloscope 

17 



First calculate the 

leftmost array element address as Array+Oldest 

18 

E011 

4F 

FOREVER: clra 


Keep X count in Acc.A (=00h) 

19 

E012 

F60000 

ldb 

Oldest 

The leftmost array index 

20 

E015 

8E0003 

ldx 

#Array 

Point IX to bottom of array 

21 

E018 

308B 

leax 

d,x 

01dest+Array=leftmost address 

22 



Now begin the scan, 

using X as a pointer to Array[i] 

23 

E01A 

E680 

DISPLOOP: ldb 

0, x+ 

Get Array[i] , i++ 

24 

E01C 

FD2000 

std 

ANALOG 

Send out X and Y points together 

25 

E01F 

8C0101 

cmpx 

#Array+256 

Over the top? 

26 

E022 

2603 

bne 

CONTINUE 

IF not THEN continue 

27 

E024 

8E0001 

ldx 

#Array 

ELSE point back to the beginning 

28 

E027 

4C 

CONTINUE: inca 


Increment X count 

29 

E028 

26F0 

bne 

DISPLOOP 

IF not back to zero THEN again 

30 



Flyback 


31 

E02A 

4A 

deca 


Make Acc.A = FFh 

32 

E02B 

B7A000 

sta 

Z_BLANK 

Send out to Z port to blank beam 

33 

E02E 

4F 

cl ra 


Prepare to return beam to leftside 

34 

E02F 

E684 

ldb 

0, x 

X back round to the start again 

35 

E031 

FD2000 

std 

ANALOG 


36 

E034 

4A 

DELAY_LOOP: deca 


1 ms minimal delay 

37 

E035 

26FD 

bne 

DELAY LOOP 


38 

E037 

B7A000 

sta 

Z BLANK 

Send out 00000000 to enable beam 

39 

40 

41 

E03A 

20D5 

bra 

FOREVER 

and repeat scan 



******** * * * * * * * * * * * * * ************** * * * * * * * * * * * * * * ************** * * * * * * 

42 



* This is the NMI interrupt service routine which puts the * 

43 



* analog sample in the array and updates the Oldest index * 

44 



* ENTRY : Via NMI. Array[] and Oldest are global * 

45 



* EXIT : Value at ANALOG in Array[01dest], Oldest inc'ed mod-256 * 

46 



* ****** * * * * * * * * * * * * * * * * * ************** * * * * * * * * ************** * * * * * * * * * 

47 



Array[01dest++]=*a 

d 


48 

E03C 

4F 

UPDATE: cl ra 


Prepare to promote Oldest 

49 

E03D 

F60000 

ldb 

Oldest 

to a 16-bit quantity 

50 

E040 

8E0003 

ldx 

#Array 

Point IX to bottom of array 

51 

E043 

308B 

leax 

d,x 

Address of Array[01dest] in IX 

52 

E045 

B66000 

Ida 

ANINPUT 

Read analog sample 

53 

E048 

A784 

sta 

0, x 

Put it out to Array[01dest] 

54 

E04A 

7C0000 

inc 

Oldest 

01dest++, mod-256 

55 

56 

57 

E04D 

3B 

rti 


and exit 



********** * * * * * * * * * * * ******************** * * * * * ************** * * * * * * * * * 

58 



* The Reset and NMI 

vectors, assuming a 2716 EPROM from E000-E7ffh * 

59 



* ****** * * * * * * * * * * * * * 

* *************** * * * * * * * * * * * * * * * * ********* * * * * * * * * 

60 



.org 

MAIN+7fch 

Takes us up to E7Fch, NMI vector 

61 

E7FC 

E03C 

.word 

UPDATE 

UPDATE is the start address of ISR 

62 

63 

64 

E7FE 

E000 

.word 

MAIN 

Reset address 



* ****** * * * * * * * * * * * * * * ******************** * * * * * * * * ************ * * * * * * * * 

65 



* RAM-based variables 

* 

66 



******** * * * * * * * * * * * * * ******************** * * * * * * * * ************ * * * * * * * * 

67 



.psect 

data 


68 

0000 

Oldest: .byte 

[1] 

Reserve one byte for Oldest 

69 

0001 

Array: .byte 

[256] 

and 256 bytes for the array 

70 



.public 

MAIN, CLOOP, FOREVER, DISPLOOP, CONTINUE 

71 



.public 

DELAY LOOP, UPDATE, Oldest, Leftmost, Array 

72 



.public 

ANALOG, ANINPUT, Z BLANK 

73 



. end 
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BFh the Y point would be Array[255]. The next point at X = CO h should be 
Array[0]. 

The NMIISR UPDATE simply computes the address of Array [Oldest] in the 
same way as the leftmost point was calculated, fetches the analog sample down 
into this element and increments the index Oldest with wrap around from FF/i 
to 00h. As the NMI interrupt saves and restores all internal registers, there is no 
restriction on register usage. 

The total length of the routine is 77 bytes plus vectors. The scan time for one 
screen of data, ignoring any interrupt service time, is 7.5 ms, giving a sweep rate 
of 133Hz. 

The 68008-based equivalent is shown in Table 1 1 0.21 Like the C program of 
Table riTTIl variables are preferentially located in registers, with only the global 
object Array [256] being located in memory. The X count is held in DO.B and the 
index to the oldest updated array element in D7.B. Address register_A0 is used in 
the background program to point to the array element currently being fetched, 
whilst A1 is a convenient way of holding the constant address of Array [0]. 

On entry to the scan loop (lines 33-40), AO points to the leftmost array el¬ 
ement to be displayed. After each point is displayed (lines 33-34) both the 
X count (in DO.B) and array pointer (AO.L) are incremented. In the case of the 
latter, wraparound occurs whenever the address reaches 256 above the array 
base (lines 37 and 39). This gives the necessary circular data structure. 

During the flyback delay, the array pointer is reset to the new leftmost point in 
line 44 by using D7.W as an index (the oldest array element) with A1 .L (pointing to 
the base of the array), that is Y le f tmost = ArrayfOl dest]. As t hi s index address 
mode uses a (sign-extended) word-sized index register (byte sized not allowed), 
D7 was originally word-sized cleared in line 22 to ensure no non-zero bits in 
D7[l 5:8] will upset this calculation. 

The level-7 ISR is called UPDATE, and simply uses D7.W (the oldest index) as 
an offset to A1 .L (which is permanently pointing to the array base), to move the 
value from the A/D converter to Array[01 dest]. Adding one to D7.B ensures 
that this points to the array element furthest back in time on exit. The byte-sized 
increment automatically gives wraparound. As none of the registers are saved or 
retrieved by a 68000 MPU interrupt, using D7 and A1 as global register variables 
is legitimate. 

The program totals 102 bytes excluding vectors and takes 5.9ms for one 
screen's worth of data, ignoring any interrupt service time. This gives a sweep 
rate of 169 Hz. 

The final figures then show a size factor of 2.4 for the 6809-based circuit and 
a speed factor of 2.7. The 68008 has a closer size factor of 1.7 together with a 
speed factor of 2.9. If we treat these figures as a worst-case scenario, then for 
realistic situations these factors are likely to be of the order of 1.5 at best; that is 
a C coding will have around 50% more code and be 50% slower than an equivalent 
assembly-level implementation. Against this must be ranged the high-level code 
advantages of cost, portability and reliability. 

Figure fl6Hl shows a typical set of X and Y traces captured on a Hewlett Packard 
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1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 000000 

15 000004 

16 000008 

17 00007C 

18 000080 

19 

20 

21 000400 

22 000402 

23 000408 

24 

25 00040A 

26 00040E 

27 000412 

28 

29 

30 000416 

31 000418 

32 

33 00041C 

34 000420 

35 000424 

36 000426 

37 000428 

38 00042E 

39 000430 

40 000436 

41 

42 000438 

43 00043A 

44 000440 

45 000444 

46 00044A 

47 00044C 

48 000450 

49 000452 

50 000454 

51 00045A 

52 

53 

54 

55 

56 

57 

58 

59 

60 
61 

62 00045C 

63 000462 

64 000464 

65 

66 

67 

68 

69 

70 00E000 

71 

72 

73 


Table 16.2 A 68008-based assembly-level coding. 

.processor m68000 

* ******** * * * * * * * * * * * * * * ******************** * * * * * * * * * * * ************ * * * 

* This is the background routine which spends its time sequentially * 

* going through the 256 array bytes, sending out the value to the * 

* Y plates whilst ramping up the X plates * 

* DO holds X count, D7 holds Oldest * 

* A0 points to the leftmost element, A1 to the base of the array * 

* ******** * * * * * * * * * * * * * * *************** * * * * * * * * * * * * * ************ * * * * * * 

.define ANALOG_X = 2000h, ANAL0G_Y = 2001h, ANINPUT = 6000h, 
Z_BLANK = OAOOOh 
.psect _text 


; 

The vector 

table 



00010000 

VECTOR: 

.double 

lOOOOh 

Initial value of Supervisor SP 

00000400 


.double 

MAIN 

and of the PC 

00000000 


.double 

[29] 

skip until 

0000045C 


.double 

UPDATE 

Level-7 vector 

00000000 


.double 

[224] 

Skip until start of program @0400h 

; 

Program proper starts here 


4247 

MAIN: 

cl r.w 

d7 

Start Oldest index as zero 

227C0000E000 

movea.1 

#Array,al 

The constant address of array start 

2049 


movea.1 

al,aO 

is the leftmost point first time in 

| 

for Ci=255 

i>-1; i 

--) Clear the array 

323C00FF 


move.w 

#255,dl 

Use Dl as a loop count i 

42301000 

CLOOP: 

cl r.b 

0(a0,dl.w) 

Clear Array[i] 

51C9FFFA 


dbf 

dl,CLOOP 

IF not THEN again 

; 

while(l) Forever display data to oscilloscope 

‘ 

First calculate the 

leftmost array 

element address as Array+Oldest 

4200 

FOREVER: 

cl r.b 

dO 

X-count = OOh 

41F07000 


lea 

0(a0,d7.w),a0 

AO holds leftmost element address 

| 

Now begin the scan 

with AO pointing to Array[i] 

11C02000 

DISPLOOP 

move.b 

dO,ANALOG_X 

Send out X-co-ordinate to screen 

11D82001 


move.b 

(a0)+,ANALOG Y 

& Y-co-ord; inc'ed array pointer 

5200 


addq. b 

#1, dO 

Increment X-co-ordinate 

6710 


beq 

FLYBACK 

IF back to zero THEN scan finished 

B1FC0000E100 

cmpa.l 

#Array+256,aO 

AO over the array top? 

66EC 


bne 

DISPLOOP 

IF not THEN next point 

207C0000E000 

movea.1 

#Array,aO 

ELSE go round to the first element 

60E4 

Flyback 

bra 

DISPLOOP 

and go again 

5300 

FLYBACK: 

subq.b 

#1, dO 

Make DO.b = FFh 

13C00000A000 

move.b 

dO,Z_BLANK 

Send out to Z port to blank screen 

41F17000 


lea 

0(al,d7.w),a0 

A1+D7 is leftmost address, -> AO 

11E800002001 

move.b 

0(a0),ANALOG Y 

Send it out to the Y-plates 

4200 


cl r.b 

dO 

X-co-ordinate is zero 

11C02000 


move.b 

dO,ANALOG_X 

at left side of screen 

5300 

DELAY_LOOP 

subq.b 

#1, dO 

1 ms nominal delay 

66FC 


bne 

DELAY LOOP 

13C00000A000 

move.b 

dO,Z BLANK 

Send out 00000000 to enable beam 

60C0 


bra 

DISPLOOP 

and repeat scan 


■ ************* * * * * * * * * * * * *************** * * * * * * * * * * * * * * * * * * ****** * * * * * * 

;* This is the level-7 interrupt service routine which puts the * 

;* analog sample in the array and updates the Oldest index * 

;* ENTRY : Via INT-7. Array[] is global and D7.B holds Oldest pointer* 

;* ENTRY : A1 points to the array bottom Array[0] * 

;* EXIT : Value at ANALOG in Array[01dest], Oldest inced mod-256 * 

■ * * * * * ************** * * ****************** * * * * * * * * * * * * * * * * ******** * * * * * * 

; Array[01dest++]=*a_d 

; Overwrite Array[01dest] with latest data from A/D converter 
13B860007000 UPDATE: move.b ANINPUT,0(al,d7.w); Send sample to Array[01dest] 
5207 addq.b #l,d7 ; Increment Oldest index modulo-256 

4E73 rte 

■ * * * * * ************** * * ****************** * * * * * * * * * * * * * * * * ******** * * * * * 

;* RAM-based variables * 

■ * * * * * ********* * * * * * * * * * * * * * * * ******** * * * * * * * * * * * * * * * * * * * * ****** * * * * * 

.psect _data 

Array: .byte [256] ; 256 bytes for the array 

.public MAIN, CLOOP, FOREVER, DISPLOOP, DELAY_LOOP, UPDATE 
.public FLYBACK, Array, ANALOG_X, ANALOG_Y, ANINPUT, Z_BLANK 
. end 
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Please insert Fig. 16.1 here. 


Figure 16.1 Typical X and Y waveforms, showing two ECG traces covering 2s. 

54501A digitizing oscilloscope. The upper trace shows the contents of the 256- 
byte array covering approximately two seconds. The bottom trace shows the 
X sweep. The flyback blank has been deliberately increased to give a refresh rate 
of 50Hz. 


16.2 More Ideas 

The time-compressed memory project used to illustrate the use of a high-level 
language for an embedded target originated as part of a commercial project, but 
has been suitably watered down to avoid the perennial problem of ' not being 
able to see the wood for the trees'. That being so, there is plenty of scope for a 
more ambitious project based on this basic core. In this section a few ideas are 
presented which should help fertilize the reader's mind in planning any further 
work. 

One relatively simple extension involving only a software change is to increase 
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the amount of data presented on the screen. Four minutes worth can be displayed 
by using two traces, which are scanned in succession. 

The overall double scan of the complete four minutes, stored in a 512-byte 
array, will still have to be accomplished in 20 ms (10 ms per trace) to give a flicker- 
free display. 

The apparently two separate traces can be simulated by reducing the vertical 
resolution to seven bits. The top trace is displayed with a MSB of 1, w hi lst the 
bottom 256 data bytes have a MSB of 0. The Y-output D/A converter will then 
bias the first 256 data bytes by \ scale. Of course the data bytes must first be 
logic shifted once right (divided by two) before the MSB is tampered with. 

As the 512 data points need to be displayed in the time previously required 
for 256 points, the processor will have to work twice as hard. If the software is 
coded in C then it is doubtful if there is sufficient reserve. Renovating the circuit 
to use a 2 MHz 68B09 (with faster EPROM) or a 12.5 MHz 68000 MPU will provide 
the additional horsepower if this is the case. However, this is probably a good 
case for using an assembly-level coding. It does work! 

Using a bidirectional X-sweep would slightly reduce the scan time. By display¬ 
ing, say, the bottom trace from left to right and then returning along the top trace 
right to left, the flyback and Z-blank delays are eliminated. Using a triangular in¬ 
stead of sawtooth timebase is a standard technique implemented by printers. It is 
feasible to use this bidirectional scan for the single trace of the basic project, but 
as the traces will be superimposed, the oscilloscope must not exhibit appreciable 
hysteresis, in my experience a problem with low-cost oscilloscopes. 

Continuing with this theme, the freeze facility can be applied to only one of 
these traces, say, the upper, whilst the lower continues on as normal. 

Another approach to displaying additional data is to use hidden pages with 
only one or two on-screen traces. Thus, for example, we could store all data 
for the last 32 mi nutes in a 32 kbyte RAM, but only display the last two minutes 
worth. However any of the 2-minute pages from time past could be displayed 
as commanded using the setting of the switch port. Furthermore a hard-copy 
routine could be written to dump the entire 32-minutes worth sequentially to a 
chart recorder or graphics printer, or even uploaded to a PC for further analysis. 
The option of invisibly acquiring data as this process is in progress, by using 
a shadow RAM, is useful. Indeed this option is also a possibility for the freeze 
process in the main project. Thus the freeze command makes the display static, 
but data continues to be acquired behind the scenes'. 

Depending on the quality of the analog data, it maybe desirable to apply a sim¬ 
ple digital smoothing routine at the output (e.g. see the 3-point filter of page[246). 
Regardless of such processes, an 8-bit quantized system will look rather granular 
in hard copy. With a 12-bit A/D converter, the number of quantization levels in¬ 
creases from 256 to 4096. Unfortunately this will require a si mi lar enhancement 
of RAM capacity from 256 byte-sized elements to 4096 word-sized elements for 
each 2-minute slot, a 32-fold increase! 

Displaying a 4096 element data array in 20 ms is well above the capabilities of 
any of the processors used in this text. Rather, every other 16th element could be 
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used for a conventional 8-bit oscilloscope display and the full resolution reserved 
for the hard-copy or uploaded version, where time is not an issue. Even so, the 
extra overhead of a x 16 interrupt rate, reading and writing 12-bit quantities over 
an 8-bit bus (e.g. see Fig. 16. ID and extracting one in sixteen bytes is onerous. 

With the processor pretty well spending its entire time displaying the wave¬ 
form, there is no spare capacity available to analyze the data. A second processor 
running in parallel with the display processor would enable both functions to be 
carried out in real time. Data acquired by the master processor could be sent to 
the slave by writing the latest sample to an output port, then interrupting the 
slave which reads this as an input port. Of course the option of uploading, say, 
via the serial link, is a viable alternative if the data rate is not too high. 

Analysis tasks include detecting waveform peaks and calculating beat rates 
and beat-to-beat variations. A separate display of the appropriate data could be 
maintained by the slave or multiplexed on to the primary display. This display 
device need not be CRO-based, a liquid crystal panel is a viable alternative, and 
will probably contain its own microprocessor. 


Appendix A 


Acronyms and Abbreviations 


A 

Accumulator_A 

A/D 

Analog to Digital converter 

B 

Accumulator_B 

BIOS 

Basic Input/Output System 

CCR 

Code Condition Register 

D 

Accumulator_D 

D/A 

Digital to Analog converter 

DMA 

Direct Memory Access 

ea 

Effective Address 

ECG 

Electro-CardioGram trace 

EKG 

See ECG 

EPROM 

Erasable Programmable Read-Only Memory 

ICE 

In-Circuit Emulator 

I/O 

Input/Output 

ISR 

Interrupt Service Routine 

K 

1024 = 2 10 

LSB 

Least Significant Bit or Byte 

Op-code 

Operation code 

OS 

Operating System 

PC 

Personal Computer 

PC 

Program Counter 

PI A 

Peripheral Interface adapter 

PI/T 

Parallel Interface Timer 

M 

1,048,576 = 2 20 

MDS 

Microprocessor Development System 

MSB 

Most Significant Bit or Byte 

MSDOS 

Microsoft Disk Operating System 

OS 

Operating System 

RAM 

Random Access Memory 

ROM 

Read-Only Memory 

S 

System Stack Pointer register 

SBC 

Single-Board Computer 

S/N 

Signal to Noise ratio 

S/H 

Sample and Hold 

SP 

Stack Pointer register 
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SSP System Stack Pointer register 

TOF Top Of Frame 

TOS Top Of Stack 

TTL Transistor Transistor Logic 

U User Stack Pointer register 

USP User Stack Pointer register 

VDU Visual Display Unit 

X Index register X 

Y Index register Y 



